Unsupervised Speaker Segmentation using Autoassociative Neural Network

S. Jothilakshmi; S. Palanivel; V. Ramalingam

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

Structured and Compact: A Novel Encoding and Enhancement Paradigm for ML-based SAT Solving

Ziqi Zhang Lan Zhang

Random Articles

Fish Disease Detection using Deep Learning and Machine Learning

Oct

2023

Stop_times based Routing Protocol for VANET

November

2013

A Web-accessible Framework for Automated Storage with Compression and Textural Classification of Malaria Parasite Images

August

2012

Low Power Low Noise Tunable Active Inductor for Narrow Band LNA Design

June

2012

Reseach Article

Unsupervised Speaker Segmentation using Autoassociative Neural Network

by S. Jothilakshmi, S. Palanivel, V. Ramalingam

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 1 - Number 7

Year of Publication: 2010

Authors: S. Jothilakshmi, S. Palanivel, V. Ramalingam

10.5120/167-293

S. Jothilakshmi, S. Palanivel, V. Ramalingam . Unsupervised Speaker Segmentation using Autoassociative Neural Network. International Journal of Computer Applications. 1, 7 ( February 2010), 24-30. DOI=10.5120/167-293

@article{ 10.5120/167-293,

author = { S. Jothilakshmi, S. Palanivel, V. Ramalingam },

title = { Unsupervised Speaker Segmentation using Autoassociative Neural Network },

journal = { International Journal of Computer Applications },

issue_date = { February 2010 },

volume = { 1 },

number = { 7 },

month = { February },

year = { 2010 },

issn = { 0975-8887 },

pages = { 24-30 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume1/number7/167-293/ },

doi = { 10.5120/167-293 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:44:54.454566+05:30

%A S. Jothilakshmi

%A S. Palanivel

%A V. Ramalingam

%T Unsupervised Speaker Segmentation using Autoassociative Neural Network

%J International Journal of Computer Applications

%@ 0975-8887

%V 1

%N 7

%P 24-30

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In this paper we propose an unsupervised approach to speaker segmentation using autoassociative neural network (AANN). Speaker segmentation aims at finding speaker change points in a speech signal which is an important preprocessing step to audio indexing, spoken document retrieval and multi speaker diarization. The method extracts the speaker specific information from the Mel frequency cepstral coefficients (MFCC). The speaker change points are detected using the distribution capturing ability of the AANN model. Experiments are carried out on different audio databases, and the method is capable of detecting speaker changes with short duration of speech in an unsupervised manner.

References

J. Ajmera, I. McCowan, H. Bourland. 2003. Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech comm. 40, 3, 351-363.
J. Ajmera, I. McCowan, H. Bourland. 2004. Robust speaker change detection. IEEE Signal Process. Lett. 11, 8, 649-651.
J. A. Arias, J. Pinquier, R. Ande-Obrecht. 2005. Evaluation of classification techniques for audio indexing. In Proceedings of the 13th European conf. Sinal processing.
C. Barras, X. Zhu, S. Meignier, J. L. Gauvain. 2006. Multistage speaker diarization of broadcast news. IEEE Trans. Audio, Speech, Lang. Process. 14, 5, 1505-1512.
J. F. Bonastre, P. Delacourt, C. Fredouille, T. Merlin, C. Wellekens. 2000. A speaker tracking system based on speaker turn detection for NIST evaluation. In Proceedings of the IEEE International conference on Acoust., Speech, Signal Process.(ICASSP 00). 1177-1180.
K. Chen. 2003. Towards better making a decision in speaker verification. Pattern Recognition. 36, 329-346.
S. S. Chen, P. S. Gopalakrishnan. 1998. Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop.
S. Cheng, H. Wang. 2004. Metric SEQDAC: a hybrid approach for audio segmentation. In Proceedings of the 8th International conference on spoken language processing. 1617-1620.
L. Couvreur, J. M. Boite. 1999. Speaker tracking in broadcast audio material in the frame work of the THISL project. In Proceedings of the Workshop on accessing information in spoken audio(ESCA-ETRW99). 84-89.
S. B. Davis, P. Mermelstein. 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, Signal Processing. 28, 357-366.
P. Delacourt, C. J. Wellekens. 2000. DISTBIC: a speaker based segmentation for audio data indexing. Speech comm. 32, 111-126.
J. Gauvain, L. Lamel, G. Adda. 2002. The LIMSI broadcast news transcription system. Speech comm. 37, 89-108.
T. Kemp, M. Schmidt, M. Westphal, A. Waibel. 2000. Acoustics, strategies for automatic segmentation of audio data, In Proceedings of the IEEE International conference on Acoust., Speech, Signal Processing.(ICASSP 00). 1423-1426.
H. Kim, D. Elter, T. Sikora. 2005. Hybrid speaker based segmentation system using model level clustering. In Proceedings of the IEEE International conference on Acoust. Speech, Signal Processing (ICASSP 05). 745-748.
M. Kotti, V. Moschou, C. Kotropoulas. 2007. Speaker segmentation and clustering. Signal processing. 88, 1091-1124.
P. Lin, J. Wang, J. Wang, H. Sung. Unsupervised speaker change detection using SVM misclassification rate. IEEE Trans. Computers. 56.
D. Liu, F. Kubala. 1999. Fast speaker change detection for broadcast news transcription and indexing. In Proceedings of the European Conf. Speech comm. and technology (EUROSPEECH '99). 1031-1034.
L. Lu, H. J. Zhang. 2002. Speaker change detection and tracking in real time news broadcasting analysis. In Proceedings of the 10th ACM Int'l conf. Multimedia. 602-610.
L. Lu, H. Zhang. 2005. Unsupervised speaker segmentation and tracking in real time audio content analysis. Multimedia system. 10, 4, 332-343.
S. Meignier, D. Moraru, C. Fredouille, J. F. Bonastre, L. Besacier. 2006. Step by step and integrated approaches in broadcast news speaker diarization. Computer, Speech and Language. 20, 303-330.
S. Mesgarani, S. Shamma, M. Slaney. 2004. Speech discrimination based on multiscale spectro-temporal modulations. In Proceedings of the IEEE Int'l conf. Acoustics, speech, signal processing (ICASSP '04). 601-604.
K. Mori, S. Nakagawa. 2001. Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition. In Proceedings of the IEEE International conference on Acoust., Speech, Signal Process.(ICASSP 01). 413-416.
M. Nishida, Y. Ariki. 1997. Speaker indexing for news articles debates and drama in broadcasted TV programs In Proceedings of the Speech Recognition Workshop. 67-72.
B. L. Pellom, J. H. L. Hansen. 1998. Automatic segmentation of speech recorded in unknown noisy channel characteristics. Speech comm. 25 ,1-3, 97-116.
M. Siegler, U. Jain, B. Raj, R. Stern. 1997. Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the DARPA Speech Recognition Workshop. 97-99.
M. Viswanathan, H. S. M. Beigi, S. Dharanipragada, A. Tritschler. 1999. Retrieval from spoken documents using content and speaker information. In Proceedings of the Fifth Int'l conf. Document Analysis and Recognition (ICDAR '99). 567-572.
S. Wegmann, P. Zhan, L. Gillick. 1999. Progress in broadcast news transcription at dragon systems. In Proceedings of the IEEE Int'l conf. Acoustics, speech, signal processing (ICASSP '99). 33-36.
B. Yegnanarayana, S. P. Kishore. 2002. AANN: An alternative to GMM for pattern recognition. Neural Networks. 15, 459-469.
T. Zhang, J. Kuo. 2001. Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech and Audio Processing. 9, 4, 441-457.
X. Zhong, M. Clements, S. Lim. 2003. Acoustic change detection and segment clustering of two way telephone conversation. In Proceedings of the European conf. speech comm. technology (EUROSPEECH '03). 2925-2928.

Index Terms

Computer Science

Information Sciences

Keywords

Audio indexing Speaker segmentation Mel frequency cepstral coefficients Autoassociative neural networks