Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Novel Method for Indexing and Retrieval of Speech using Gaussian Mixture Model Techniques

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Authors:
R. Thiruvengatanadhan, P. Dhanalakshmi
10.5120/ijca2016911098

R Thiruvengatanadhan and P Dhanalakshmi. A Novel Method for Indexing and Retrieval of Speech using Gaussian Mixture Model Techniques. International Journal of Computer Applications 148(3):42-47, August 2016. BibTeX

@article{10.5120/ijca2016911098,
	author = {R. Thiruvengatanadhan and P. Dhanalakshmi},
	title = {A Novel Method for Indexing and Retrieval of Speech using Gaussian Mixture Model Techniques},
	journal = {International Journal of Computer Applications},
	issue_date = {August 2016},
	volume = {148},
	number = {3},
	month = {Aug},
	year = {2016},
	issn = {0975-8887},
	pages = {42-47},
	numpages = {6},
	url = {http://www.ijcaonline.org/archives/volume148/number3/25741-2016911098},
	doi = {10.5120/ijca2016911098},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Large speech databases such a Television broadcasts, TV programs, radio broadcasts, CDs and DVDs are available online these days. Research related to speech indexing and retrieval has received much attention in recent days due to the huge multimedia data storage capabilities. The goal of speech indexing and retrieval system is to provide the user with capabilities to index and retrieve the speech archives in an efficient manner. In this paper, we propose a method for indexing and retrieval of the speech. The speech activity is identified using voice activitiy detection and each complete speech dialogue is separated into individual words by marking each word‟s segment through the Root Means Square (RMS) energy envelope. Then the features namely Perceptual Linear Prediction (PLP), Power Normalized Cepstral Coefficient (PNCC), Subband Coding (SBC) and Sonogram extracted from each of the individual word. For retrieval, a novel method is proposed using Gaussian mixture models. The probability that the query feature vector belongs to the Gaussian is computed. The average Probability density function is computed for each of the feature vectors in the database and the retrieval is based on the highest probability.

References

  1. YaliZheng, Chisaki, Y. and Usagawa T., 2013, Speech/Music Indexing for Audio Life Logs from Portable Device Record, IEEE International Conference on Advanced Computer Science and Information Systems, pp. 173-178.
  2. Tsung-Hsien Wen, Hung-Yi Lee, Pei-hao Su and Lin-shan Lee, 2013, Interactive Spoken Content Retrieval by Extended Query Model and Continuous State Space Markov Decision Process, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8510-8514.
  3. Iswarya, P. and Radha, V, 2014, Speech and Text Query Based Tamil - English Cross Language Information Retrieval system, International Conference on Computer Communication and Informatics, pp. 1-4, Coimbatore.
  4. Chien-Lin Huang, Chiori Hori and Hideki Kashioka, 2013, Semantic Inference Based on Neural Probabilistic Language Modeling for Speech Indexing, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8480-8484.
  5. Ivan Markovi, Sre´ ckoJuri´ Kavelj and Ivan Petrovi, 2013, Partial Mutual Information Based Input Variable Selection for Supervised Learning Approaches to Voice Activity Detection, Applied Soft Computing Elsevier, vol. 13, pp. 4383-4391.
  6. Khoubrouy, S. A. and Panahi, I.M.S., 2013, Voice Activation Detection using Teager-Kaiser Energy Measure, International Symposium on Image and Signal Processing and Analysis, pp. 388-392.
  7. Peter M. Grosche, 2012, Signal Processing Methods for Beat Tracking, Music Segmentation and Audio Retrieval, Thesis, Universit¨at des Saarlandes.
  8. PetrMotlcek, 2003, Modeling of Spectra and Temporal Trajectories in Speech Processing, Ph.D thesis, Brno University of Technology.
  9. Chanwookim and Stern, R. M., 2012, Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4101-4104.
  10. Xin Yan and Ying Li, 2012, Anti-noise Power Normalized Cepstral Coefficients for Robust Environmental Sounds Recognition in Real Noisy Conditions, Fourth International Conference on Computational Intelligence and Communication Networks, pp. 263-267.
  11. Arcos Gordillo, C., Grivet, M.A. and Alcaim, A, 2014, PNCC Features and FNN - MAP Compensation Techniques for Continuous Speech Recognition, IEEE International Telecommunications Symposium, pp. 1-5.
  12. Venkatramaphani kumar S and K V Krishna Kishore, 2013, An Efficient Multimodal Person Authentication System using Gabor and Subband Coding, IEEE International Conference Computational Intelligence and Computing Research, pp. 1-5.
  13. Zhu Leqing, Zhang Zhen, 2010, Insect Sound Recognition Based on SBC and HMM, International Conference on Intelligent Computation Technology and Automation, IEEE, pp. 544-548.
  14. Chaya. S, Ramjan Khatik, Siraj Patha and Banda Nawaz, 2014, Subband Coding of Speech Signal Using Scilab, International Journal of Electronics & Communication (IIJEC), vol. 2, Issue 5.
  15. Mahdi Hatam and Mohammad Ali Masnadi-Shirazi, 2015, Optimum Nonnegative Integer Bit Allocation for Wavelet Based Signal Compression and Coding, Information Sciences Elsevier, pp. 332-344.
  16. Xiaowen Cheng, Jarod V. Hart, and James S. Walker, 2008, “Time-frequency Analysis of Musical Rhythm,” Notices of AMS, vol. 56, no. 3.
  17. Ausgef¨uhrt, 2006, Evaluation of New Audio Features and Their Utilization in Novel Music Retrieval Applications, Master's thesis, Vienna University of Technology.
  18. Eberhard Zwicker and Hugo Fastl, 1999, Psychoacoustics-Facts and Models, Springer Series of Information Sciences, Berlin.
  19. Schroder M. R., B. S. Atal, and J. L. Hall, 1979, Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear, Journal of the Acoustical Society of America, vol. 66, pp. 1647-1652.
  20. Tang, H., Chu, S. M., Hasegawa-Johnson, M. and Huang, T. S., 2012, Partially Supervised Speaker Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 5, pp. 959-971.
  21. Reddy, P. R., Rout, K. and Rama Murty, K. S., 2014, Query Word Retrieval from Continuous Speech using GMM Posteriorgrams, International Conference on Signal Processing and Communications, pp. 1-6.
  22. Menaka Rajapakse and Lonce Wyse, 2005, Generic Audio Classification using a Hybrid Model Based on GMMs and HMMs, IEEE International Multimedia Modelling Conference, pp. 53-58
  23. Zanoni, M., Ciminieri, D., Sarti, A. and Tubaro, S, 2012, Searching for Dominant High-Level Features for Music Information Retrieval, European Signal Processing Conference, pp. 2025-2029.
  24. Chunhui Wang, Qianqian Zhu, Zhenyu Shan, Yingjie Xia and Yuncai Liu, 2014, Fusing Heterogeneous Traffic Data by Kalman Filters and Gaussian Mixture Models, IEEE International Conference on Intelligent Transportation Systems, pp. 276-281.
  25. Rafael Iriya and Miguel Arjona Ramírez, 2014, Gaussian Mixture Models with Class-Dependent Features for Speech Emotion Recognition, IEEE Workshop on Statistical Signal Processing, pp. 480-483.

Keywords

Voice Activity Detection, Root Mean Square, Gaussian Mixture Model, Probability density function