CFP last date
20 May 2024
Reseach Article

Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping

by Santosh V. Chapaneri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 40 - Number 3
Year of Publication: 2012
Authors: Santosh V. Chapaneri
10.5120/5022-7167

Santosh V. Chapaneri . Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping. International Journal of Computer Applications. 40, 3 ( February 2012), 6-12. DOI=10.5120/5022-7167

@article{ 10.5120/5022-7167,
author = { Santosh V. Chapaneri },
title = { Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping },
journal = { International Journal of Computer Applications },
issue_date = { February 2012 },
volume = { 40 },
number = { 3 },
month = { February },
year = { 2012 },
issn = { 0975-8887 },
pages = { 6-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume40/number3/5022-7167/ },
doi = { 10.5120/5022-7167 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:27:05.414767+05:30
%A Santosh V. Chapaneri
%T Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping
%J International Journal of Computer Applications
%@ 0975-8887
%V 40
%N 3
%P 6-12
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, we propose novel techniques for feature parameter extraction based on MFCC and feature recognition using dynamic time warping algorithm for application in speaker-independent isolated digits recognition. Using the proposed Weighted MFCC (WMFCC), we achieve low computational overhead for the feature recognition stage since we use only 13 weighted MFCC coefficients instead of the conventional 39 MFCC coefficients including the delta and double delta features. In order to capture the trends or patterns that a feature sequence presents during the alignment process, we compute the local and global features using Improved Features for DTW algorithm (IFDTW), rather than using the pure feature values or their estimated derivatives. The experiments based on TI-Digits corpus demonstrate the effectiveness of proposed techniques leading to higher recognition accuracy of 98.13%.

References
  1. R. Cox, C. Kamm, L. Rabiner, J. Schroeter, and J. Wilpon, “Speech and language processing for next-millennium communications services”, Proc. of the IEEE, vol. 88, no. 8, Aug 2000
  2. D. Jurafsky, and J. Martin, Speech and Language Processing, Prentice Hall, 2000
  3. J. Tierney, “A study of LPC analysis of speech in additive noise”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 389-397, 1980
  4. A. Paul, D. Das, and M. Kamal, “Bangla speech recognition system using LPC and ANN”, 7th Intl. Conf. Advances in Pattern Recognition, 2009
  5. L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  6. S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug 1980
  7. A. Mishra, M. Chandra, A. Biswas, and S. Sharan, “Robust features for connected Hindi digits recognition”, Intl. Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 4, no. 2, pp. 79-90, June 2011
  8. Z.Jun, S. Kwong, W. Gang, and Q. Hong, “Using Mel-frequency cepstral coefficients in missing data technique”, EURASIP Journal on Applied Signal Processing, vol. 2004, no. 3, pp. 340-346, 2004
  9. O. W. Kwon, K. Chan, J. Hao, and T. W. Lee, “Emotion recognition by speech signals”, in Proc.8th European Conf. Speech Communication and Technology, pp. 125-128, Geneva, Switzerland, 2003
  10. L. Rabiner, B. Juang, S. Levinson, and M. Sondhi, “Recognition of isolated digits using hidden markov models with continuous mixture densities”, AT&T Tech. Journal, 64(6), 1985
  11. H. Sakoe, and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, 1978
  12. L. Rabiner, A. Rosenberg, and S. Levinson, “Considerations in dynamic time warping algorithms for discrete word recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, 1978
  13. W. Fu, X. Yang, and Y. Wang, “Heart sound diagnosis based on DTW and MFCC”, 3rd IEEE Intl. Congress on Image and Signal Processing, pp. 2920-2923, Oct 2010
  14. F. Yu, E. Chang, Y. Xu, and H. Shum, “Emotion detection from speech to enrich multimedia content”, in Proc.2nd IEEE Pacific Rim Conf. Multimedia, pp. 550-557, Beijing, China, 2001
  15. S. Singh, and E. Rajan, “Vector Quantization approach for speaker recognition using MFCC and inverted MFCC”, International Journal of Computer Applications, vol. 17, no. 1, Mar 2011
  16. R. Tato, R. Santos, R. Kompe, and J. Pardo, “Emotional space improves emotion recognition”, in Proc. 7th Intl. Conf. Spoken Language Processing, vol. 3, pp. 2029-2032, Denver, USA, 2002
  17. L. Rabiner, and M. Sambur, “An algorithm for determining the endpoints of isolated utterances”, Bell System Technical Journal, vol. 54, no. 2, pp. 297-315, Feb 1975
  18. J. Picone, “Signal modeling techniques in speech recognition”, Proc. of the IEEE, vol. 81, no. 9, Sep 1993
  19. J. Deller, J. Proakis, and J. Hansen, Discrete Time Processing of Speech Signals, Prentice Hall, NJ, USA, 1993
  20. S. Kopparapu, and M. Laxminarayana, “Choice of Mel filter bank in computing MFCC of a resampled speech”, Proc. IEEE Intl. Conf. Information Sciences Signal Processing and their Applications, pp. 121-124, May 2010
  21. G. Bekesy, Experiments in Hearing, Mc-Graw Hill, New York, 1960
  22. H. Hassanein, and M. Rudko, “On the use of Discrete Cosine Transform in cepstral analysis”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 32, no. 4, pp. 922-925, 1984
  23. B. Juang, L. Rabiner, and J. Wilpon, “On the use of bandpass liftering in speech recognition”, IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 765-768, Apr 1986
  24. W. Hong, P. Jingui, “Modified MFCCs for robust speaker recognition”, IEEE Intl. Conf. Intelligent Computing and Intelligent Systems, pp. 276-279, Oct 2010
  25. W. Junqin, and Y. Junjun, “An improved arithmetic of MFCC in speech recognition system”, IEEE Intl. Conf. Electronics, Communications and Control, pp. 719-722, China, Sep 2011
  26. S. Ong, and C. Yang, “A comparative study of text-independent speaker identification using statistical features”, Intl. Journal on Computer Engineering Management, vol. 6, no. 1, 1998
  27. F. Itakura, “Minimum prediction residual principle applied to speech recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-23, pp. 52-72, 1975
  28. E. Keogh, and M. Pazzani, “Derivative dynamic time warping”, Proc. of the 1st SIAM Intl. Conf. Data Mining, Chicago, USA, 2001
  29. S. Salvador, and P. Chan, “FastDTW: toward accurate dynamic time warping in linear time and space”, Proc. of 3rd KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80, 2004
  30. L. Yan-Sheng, and J. Chang-Peng, “Research on improved algorithm of DTW in speech recognition”, IEEE Intl. Conf. Computer Application and System Modeling, pp. 418-421, Oct 2010
  31. K. Chanwoo, and S. Kwang-deok, “Robust DTW-based recognition algorithm for hand-held consumer devices”, IEEE Intl. Conf. Consumer Electronics, pp. 433-434, Jan 2005
  32. R. Leonard, “A database for speaker-independent digit recognition”, IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 328-331, Mar 1984
Index Terms

Computer Science
Information Sciences

Keywords

Speech recognition MFCC Dynamic time warping