CFP last date
20 May 2024
Reseach Article

Unsupervised Speaker Segmentation using Autoassociative Neural Network

by S. Jothilakshmi, S. Palanivel, V. Ramalingam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 7
Year of Publication: 2010
Authors: S. Jothilakshmi, S. Palanivel, V. Ramalingam
10.5120/167-293

S. Jothilakshmi, S. Palanivel, V. Ramalingam . Unsupervised Speaker Segmentation using Autoassociative Neural Network. International Journal of Computer Applications. 1, 7 ( February 2010), 24-30. DOI=10.5120/167-293

@article{ 10.5120/167-293,
author = { S. Jothilakshmi, S. Palanivel, V. Ramalingam },
title = { Unsupervised Speaker Segmentation using Autoassociative Neural Network },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 7 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 24-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number7/167-293/ },
doi = { 10.5120/167-293 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:44:54.454566+05:30
%A S. Jothilakshmi
%A S. Palanivel
%A V. Ramalingam
%T Unsupervised Speaker Segmentation using Autoassociative Neural Network
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 7
%P 24-30
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper we propose an unsupervised approach to speaker segmentation using autoassociative neural network (AANN). Speaker segmentation aims at finding speaker change points in a speech signal which is an important preprocessing step to audio indexing, spoken document retrieval and multi speaker diarization. The method extracts the speaker specific information from the Mel frequency cepstral coefficients (MFCC). The speaker change points are detected using the distribution capturing ability of the AANN model. Experiments are carried out on different audio databases, and the method is capable of detecting speaker changes with short duration of speech in an unsupervised manner.

References
  1. J. Ajmera, I. McCowan, H. Bourland. 2003. Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech comm. 40, 3, 351-363.
  2. J. Ajmera, I. McCowan, H. Bourland. 2004. Robust speaker change detection. IEEE Signal Process. Lett. 11, 8, 649-651.
  3. J. A. Arias, J. Pinquier, R. Ande-Obrecht. 2005. Evaluation of classification techniques for audio indexing. In Proceedings of the 13th European conf. Sinal processing.
  4. C. Barras, X. Zhu, S. Meignier, J. L. Gauvain. 2006. Multistage speaker diarization of broadcast news. IEEE Trans. Audio, Speech, Lang. Process. 14, 5, 1505-1512.
  5. J. F. Bonastre, P. Delacourt, C. Fredouille, T. Merlin, C. Wellekens. 2000. A speaker tracking system based on speaker turn detection for NIST evaluation. In Proceedings of the IEEE International conference on Acoust., Speech, Signal Process.(ICASSP 00). 1177-1180.
  6. K. Chen. 2003. Towards better making a decision in speaker verification. Pattern Recognition. 36, 329-346.
  7. S. S. Chen, P. S. Gopalakrishnan. 1998. Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.
  8. S. Cheng, H. Wang. 2004. Metric SEQDAC: a hybrid approach for audio segmentation. In Proceedings of the 8th International conference on spoken language processing. 1617-1620.
  9. L. Couvreur, J. M. Boite. 1999. Speaker tracking in broadcast audio material in the frame work of the THISL project. In Proceedings of the Workshop on accessing information in spoken audio(ESCA-ETRW99). 84-89.
  10. S. B. Davis, P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing. 28, 357-366.
  11. P. Delacourt, C. J. Wellekens. 2000. DISTBIC: a speaker based segmentation for audio data indexing. Speech comm. 32, 111-126.
  12. J. Gauvain, L. Lamel, G. Adda. 2002. The LIMSI broadcast news transcription system. Speech comm. 37, 89-108.
  13. T. Kemp, M. Schmidt, M. Westphal, A. Waibel. 2000. Acoustics, strategies for automatic segmentation of audio data, In Proceedings of the IEEE International conference on Acoust., Speech, Signal Processing.(ICASSP 00). 1423-1426.
  14. H. Kim, D. Elter, T. Sikora. 2005. Hybrid speaker based segmentation system using model level clustering. In Proceedings of the IEEE International conference on Acoust. Speech, Signal Processing (ICASSP 05). 745-748.
  15. M. Kotti, V. Moschou, C. Kotropoulas. 2007. Speaker segmentation and clustering. Signal processing. 88, 1091-1124.
  16. P. Lin, J. Wang, J. Wang, H. Sung. Unsupervised speaker change detection using SVM misclassification rate. IEEE Trans. Computers. 56.
  17. D. Liu, F. Kubala. 1999. Fast speaker change detection for broadcast news transcription and indexing. In Proceedings of the European Conf. Speech comm. and technology (EUROSPEECH '99). 1031-1034.
  18. L. Lu, H. J. Zhang. 2002. Speaker change detection and tracking in real time news broadcasting analysis. In Proceedings of the 10th ACM Int'l conf. Multimedia. 602-610.
  19. L. Lu, H. Zhang. 2005. Unsupervised speaker segmentation and tracking in real time audio content analysis. Multimedia system. 10, 4, 332-343.
  20. S. Meignier, D. Moraru, C. Fredouille, J. F. Bonastre, L. Besacier. 2006. Step by step and integrated approaches in broadcast news speaker diarization. Computer, Speech and Language. 20, 303-330.
  21. S. Mesgarani, S. Shamma, M. Slaney. 2004. Speech discrimination based on multiscale spectro-temporal modulations. In Proceedings of the IEEE Int'l conf. Acoustics, speech, signal processing (ICASSP '04). 601-604.
  22. K. Mori, S. Nakagawa. 2001. Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition. In Proceedings of the IEEE International conference on Acoust., Speech, Signal Process.(ICASSP 01). 413-416.
  23. M. Nishida, Y. Ariki. 1997. Speaker indexing for news articles debates and drama in broadcasted TV programs In Proceedings of the Speech Recognition Workshop. 67-72.
  24. B. L. Pellom, J. H. L. Hansen. 1998. Automatic segmentation of speech recorded in unknown noisy channel characteristics. Speech comm. 25 ,1-3, 97-116.
  25. M. Siegler, U. Jain, B. Raj, R. Stern. 1997. Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the DARPA Speech Recognition Workshop. 97-99.
  26. M. Viswanathan, H. S. M. Beigi, S. Dharanipragada, A. Tritschler. 1999. Retrieval from spoken documents using content and speaker information. In Proceedings of the Fifth Int'l conf. Document Analysis and Recognition (ICDAR '99). 567-572.
  27. S. Wegmann, P. Zhan, L. Gillick. 1999. Progress in broadcast news transcription at dragon systems. In Proceedings of the IEEE Int'l conf. Acoustics, speech, signal processing (ICASSP '99). 33-36.
  28. B. Yegnanarayana, S. P. Kishore. 2002. AANN: An alternative to GMM for pattern recognition. Neural Networks. 15, 459-469.
  29. T. Zhang, J. Kuo. 2001. Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech and Audio Processing. 9, 4, 441-457.
  30. X. Zhong, M. Clements, S. Lim. 2003. Acoustic change detection and segment clustering of two way telephone conversation. In Proceedings of the European conf. speech comm. technology (EUROSPEECH '03). 2925-2928.
Index Terms

Computer Science
Information Sciences

Keywords

Audio indexing Speaker segmentation Mel frequency cepstral coefficients Autoassociative neural networks