Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping

Santosh V. Chapaneri

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping

by Santosh V. Chapaneri

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 40 - Number 3

Year of Publication: 2012

Authors: Santosh V. Chapaneri

10.5120/5022-7167

Santosh V. Chapaneri . Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping. International Journal of Computer Applications. 40, 3 ( February 2012), 6-12. DOI=10.5120/5022-7167

@article{ 10.5120/5022-7167,

author = { Santosh V. Chapaneri },

title = { Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping },

journal = { International Journal of Computer Applications },

issue_date = { February 2012 },

volume = { 40 },

number = { 3 },

month = { February },

year = { 2012 },

issn = { 0975-8887 },

pages = { 6-12 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume40/number3/5022-7167/ },

doi = { 10.5120/5022-7167 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:27:05.414767+05:30

%A Santosh V. Chapaneri

%T Spoken Digits Recognition using Weighted MFCC and Improved Features for Dynamic Time Warping

%J International Journal of Computer Applications

%@ 0975-8887

%V 40

%N 3

%P 6-12

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In this paper, we propose novel techniques for feature parameter extraction based on MFCC and feature recognition using dynamic time warping algorithm for application in speaker-independent isolated digits recognition. Using the proposed Weighted MFCC (WMFCC), we achieve low computational overhead for the feature recognition stage since we use only 13 weighted MFCC coefficients instead of the conventional 39 MFCC coefficients including the delta and double delta features. In order to capture the trends or patterns that a feature sequence presents during the alignment process, we compute the local and global features using Improved Features for DTW algorithm (IFDTW), rather than using the pure feature values or their estimated derivatives. The experiments based on TI-Digits corpus demonstrate the effectiveness of proposed techniques leading to higher recognition accuracy of 98.13%.

References

R. Cox, C. Kamm, L. Rabiner, J. Schroeter, and J. Wilpon, “Speech and language processing for next-millennium communications services”, Proc. of the IEEE, vol. 88, no. 8, Aug 2000
D. Jurafsky, and J. Martin, Speech and Language Processing, Prentice Hall, 2000
J. Tierney, “A study of LPC analysis of speech in additive noise”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 389-397, 1980
A. Paul, D. Das, and M. Kamal, “Bangla speech recognition system using LPC and ANN”, 7th Intl. Conf. Advances in Pattern Recognition, 2009
L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357-366, Aug 1980
A. Mishra, M. Chandra, A. Biswas, and S. Sharan, “Robust features for connected Hindi digits recognition”, Intl. Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 4, no. 2, pp. 79-90, June 2011
Z.Jun, S. Kwong, W. Gang, and Q. Hong, “Using Mel-frequency cepstral coefficients in missing data technique”, EURASIP Journal on Applied Signal Processing, vol. 2004, no. 3, pp. 340-346, 2004
O. W. Kwon, K. Chan, J. Hao, and T. W. Lee, “Emotion recognition by speech signals”, in Proc.8th European Conf. Speech Communication and Technology, pp. 125-128, Geneva, Switzerland, 2003
L. Rabiner, B. Juang, S. Levinson, and M. Sondhi, “Recognition of isolated digits using hidden markov models with continuous mixture densities”, AT&T Tech. Journal, 64(6), 1985
H. Sakoe, and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, 1978
L. Rabiner, A. Rosenberg, and S. Levinson, “Considerations in dynamic time warping algorithms for discrete word recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-26, 1978
W. Fu, X. Yang, and Y. Wang, “Heart sound diagnosis based on DTW and MFCC”, 3rd IEEE Intl. Congress on Image and Signal Processing, pp. 2920-2923, Oct 2010
F. Yu, E. Chang, Y. Xu, and H. Shum, “Emotion detection from speech to enrich multimedia content”, in Proc.2nd IEEE Pacific Rim Conf. Multimedia, pp. 550-557, Beijing, China, 2001
S. Singh, and E. Rajan, “Vector Quantization approach for speaker recognition using MFCC and inverted MFCC”, International Journal of Computer Applications, vol. 17, no. 1, Mar 2011
R. Tato, R. Santos, R. Kompe, and J. Pardo, “Emotional space improves emotion recognition”, in Proc. 7th Intl. Conf. Spoken Language Processing, vol. 3, pp. 2029-2032, Denver, USA, 2002
L. Rabiner, and M. Sambur, “An algorithm for determining the endpoints of isolated utterances”, Bell System Technical Journal, vol. 54, no. 2, pp. 297-315, Feb 1975
J. Picone, “Signal modeling techniques in speech recognition”, Proc. of the IEEE, vol. 81, no. 9, Sep 1993
J. Deller, J. Proakis, and J. Hansen, Discrete Time Processing of Speech Signals, Prentice Hall, NJ, USA, 1993
S. Kopparapu, and M. Laxminarayana, “Choice of Mel filter bank in computing MFCC of a resampled speech”, Proc. IEEE Intl. Conf. Information Sciences Signal Processing and their Applications, pp. 121-124, May 2010
G. Bekesy, Experiments in Hearing, Mc-Graw Hill, New York, 1960
H. Hassanein, and M. Rudko, “On the use of Discrete Cosine Transform in cepstral analysis”, IEEE Trans. Acoustics, Speech and Signal Processing, vol. 32, no. 4, pp. 922-925, 1984
B. Juang, L. Rabiner, and J. Wilpon, “On the use of bandpass liftering in speech recognition”, IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 765-768, Apr 1986
W. Hong, P. Jingui, “Modified MFCCs for robust speaker recognition”, IEEE Intl. Conf. Intelligent Computing and Intelligent Systems, pp. 276-279, Oct 2010
W. Junqin, and Y. Junjun, “An improved arithmetic of MFCC in speech recognition system”, IEEE Intl. Conf. Electronics, Communications and Control, pp. 719-722, China, Sep 2011
S. Ong, and C. Yang, “A comparative study of text-independent speaker identification using statistical features”, Intl. Journal on Computer Engineering Management, vol. 6, no. 1, 1998
F. Itakura, “Minimum prediction residual principle applied to speech recognition”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-23, pp. 52-72, 1975
E. Keogh, and M. Pazzani, “Derivative dynamic time warping”, Proc. of the 1st SIAM Intl. Conf. Data Mining, Chicago, USA, 2001
S. Salvador, and P. Chan, “FastDTW: toward accurate dynamic time warping in linear time and space”, Proc. of 3rd KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80, 2004
L. Yan-Sheng, and J. Chang-Peng, “Research on improved algorithm of DTW in speech recognition”, IEEE Intl. Conf. Computer Application and System Modeling, pp. 418-421, Oct 2010
K. Chanwoo, and S. Kwang-deok, “Robust DTW-based recognition algorithm for hand-held consumer devices”, IEEE Intl. Conf. Consumer Electronics, pp. 433-434, Jan 2005
R. Leonard, “A database for speaker-independent digit recognition”, IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 328-331, Mar 1984

Index Terms

Computer Science

Information Sciences

Keywords

Speech recognition MFCC Dynamic time warping