CFP last date
20 May 2024
Reseach Article

Article:Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC

by Satyanand Singh, Dr. E.G. Rajan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 17 - Number 1
Year of Publication: 2011
Authors: Satyanand Singh, Dr. E.G. Rajan
10.5120/2188-2774

Satyanand Singh, Dr. E.G. Rajan . Article:Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC. International Journal of Computer Applications. 17, 1 ( March 2011), 1-7. DOI=10.5120/2188-2774

@article{ 10.5120/2188-2774,
author = { Satyanand Singh, Dr. E.G. Rajan },
title = { Article:Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC },
journal = { International Journal of Computer Applications },
issue_date = { March 2011 },
volume = { 17 },
number = { 1 },
month = { March },
year = { 2011 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume17/number1/2188-2774/ },
doi = { 10.5120/2188-2774 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:04:28.308329+05:30
%A Satyanand Singh
%A Dr. E.G. Rajan
%T Article:Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC
%J International Journal of Computer Applications
%@ 0975-8887
%V 17
%N 1
%P 1-7
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but effective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is strongly determined by the quality of the front-end. In other words, classification can be at most as accurate as the features. Over the years, Mel-Frequency Cepstral Coefficients (MFCC) modeled on the human auditory system has been used as a standard acoustic feature set for speech related applications. In this paper it has been shown that the inverted Mel-Frequency Cepstral Coefficients is one of the performance enhancement parameters for speaker recognition, which contains high frequency region complementary information in it. This paper introduces the Gaussian shaped filter (GF) while calculation MFCC and inverted MFCC in place of traditional triangular shaped bins. The main idea is to introduce a higher amount of correlation between subband outputs. The performance of both MFCC and inverted MFCC improve with GF over traditional triangular filter (TF) based implementation, individually as well as in combination. In this study the Vector Quantization (VQ) feature matching technique was used, due to high accuracy and its simplicity. The proposed investigation achieved 98.57% of efficiency with a very short test voice sample 2 seconds.

References
  1. D. Gatica-Perez, G. Lathoud, J.-M. Odobez and I. Mc Cowan. 2007 Audiovisual probabilistic tracking of multiple speakers in meetings, IEEE Transactions on Speech and Audio Processing, 15(2), pp. 601–616.
  2. J. P. Cambell, Jr. 1997 Speaker Recognition A Tutorial Proceedings of the IEEE, 85(9), pp. 1437-1462.
  3. Faundez-Zanuy M. and Monte-Moreno E. 2005 State-of-the-art in speaker recognition , Aerospace and Electronic Systems Magazine, IEEE, 20(5), pp. 7-12.
  4. K. Saeed and M. K. Nammous. 2005 Heuristic method of Arabic speech recognition, in Proc. IEEE 7th Int. Conf. DSPA, Moscow, Russia, pp. 528–530
  5. D. Olguin, P.A.Goor, and A. Pentland. 2009 Capturing individual and group behavior with wearable sensors, in Proceedings of AAAI Spring Symposium on Human Behavior Modeling.
  6. S. B. Davis and P. Mermelstein. 1980 Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Trans. On ASSP, 28(4), pp. 357-365.
  7. R. Vergin, B, O Shaughnessy and A. Farhat. 1999 Generalized Mel frequency Cepstral coefficients for large-vocabulary speaker independent continuous-speech recognition, IEEE Trans. On ASSP,7(5), pp. 525-532.
  8. Chakroborty, S., Roy, A. and Saha, G. 2007 Improved Closed set Text- Independent Speaker Identification by Combining MFCC with Evidence from Flipped Filter Banks , International Journal of Signal Processing, 4(2), pp. 114-122.
  9. S.Singh and Dr. E.G Rajan. 2007 A Vector Quantization approach Using MFCC for Speaker Recognition, International conference Systemic, Cybernatics and Informatics ICSCI under the Aegis of Pentagram Research Centre Hyderabad, pp. 786-790.
  10. K. Sri Rama Murty and B. Yegnanarayana. 2006 Combining evidence from residual phase and MFCC features for speaker recognition, IEEE Signal Processing Letters, 13(1), pp. 52-55.
  11. Yegnanarayana B., Prasanna S.R.M., Zachariah J.M. and Gupta C. S. 2005 Combining evidence from source suprasegmental and spectral features for a fixed-text speaker verification system , IEEE Trans. Speech and Audio Processing, 13(4), pp. 575-582.
  12. J. Kittler, M. Hatef, R. Duin, J. Mataz. 1998 On combining classifiers, IEEE Trans, Pattern Anal. Mach. Intell, 20(3), pp. 226-239.
  13. He, J., Liu, L., Palm, G. 1999 A Discriminative Training Algorithm for VQ-based Speaker Identification , IEEE Transactions on Speech and Audio Processing, 7(3), pp. 353-356.
  14. Laurent Besacier and Jean-Francois Bonastre. 2000 Subband architecture for automatic speaker recognition, Signal Processing, 80, pp. 1245-1259.
  15. Zheng F., Zhang, G. and Song, Z. 2001 Comparison of different implementations of MFCC, J. Computer Science & Technology 16(6), pp. 582-589.
  16. Ganchev, T., Fakotakis, N., and Kokkinakis, G. 2005 Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task, Proc. of SPECOM Patras, Greece, pp. 1191-194.
  17. Zhen B., Wu X., Liu Z., Chi H. 2000 On the use of band pass filtering in speaker recognition, Proc. 6th Int. Conf. of Spoken Lang. Processing (ICSLP), Beijing, China.
  18. S. Singh, Dr. E.G Rajan, P.Sivakumar, M.Bhoopathy and V.Subha. 2008 Text Dependent Speaker Recognition System in Presence Monitoring, International conference Systemic, Cybernatics and Informatics ICSCI -under the Aegis of Pentagram Research Centre Hyderabad, pp. 550-554.
  19. Kyung Y.J., Lee H.S. 1999 Bootstrap and aggregating VQ classifier for speaker recognition, Electronics Letters, 35(12), pp. 973-974.
  20. Y. Linde, A. Buzo, and R. M. Gray. 1980 An algorithm for vector quantizer design, IEEE Trans. Commun, 28(1), pp. 84-95.
  21. S.R. Mahadeva Prasanna, Cheedella S. Gupta, B. Yegnanarayana. 2006 Extraction of speaker-specific excitation information from linear prediction residual of speech, Speech Communication, 48(10), pp. 1243- 1261.
  22. Daniel J. Mashao, Marshalleno Skosan. 2006 Combining Classifier Decisions for Robust Speaker Identification, Pattern Recog,, 39, pp. 147-155.
  23. Ben Gold and Nelson Morgan. 2002 Speech and Audio Signal Processing, John Willy & Sons, Chap.14, pp. 189-203.
  24. A. Papoulis and S. U. Pillai. 2002 Probability, Random variables and Stochastic Processes, Tata McGraw-Hill Edition, Fourth Edition, Chap. 4, pp.72-122.
  25. Daniel Garcia-Romero, Julian Fierrez-Aguilar, Joaquin Gonzalez- Rodriguez, Javier Ortega-Garcia. 2006 Using quality measures for multilevel speaker recognition, Computer Speech and Language, 20(2), pp. 192-209.
  26. He J., Liu L., Palm G. 2008 A discriminative training algorithms for VQ based speaker identification, IEEE Transactions on Speech and Audio Processing, 7(3), pp. 353-356.
  27. Tomi Kinnunen and Pasi Franti. 2005 Speaker Discriminative Weighting Method for VQ-based Speaker identification, Macmillan Publishing Company, New York.
Index Terms

Computer Science
Information Sciences

Keywords

GF Triangular Filter Subbands Correlation MFCC inverted MFCC Vector Quantization