CFP last date
22 April 2024
Reseach Article

Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks

by T. B. Adam, Md Salam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 42 - Number 12
Year of Publication: 2012
Authors: T. B. Adam, Md Salam
10.5120/5745-7946

T. B. Adam, Md Salam . Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks. International Journal of Computer Applications. 42, 12 ( March 2012), 21-27. DOI=10.5120/5745-7946

@article{ 10.5120/5745-7946,
author = { T. B. Adam, Md Salam },
title = { Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 42 },
number = { 12 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 21-27 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume42/number12/5745-7946/ },
doi = { 10.5120/5745-7946 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:31:08.322979+05:30
%A T. B. Adam
%A Md Salam
%T Spoken English Alphabet Recognition with Mel Frequency Cepstral Coefficients and Back Propagation Neural Networks
%J International Journal of Computer Applications
%@ 0975-8887
%V 42
%N 12
%P 21-27
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Spoken alphabet recognition as one of the subsets of speechrecognition and pattern recognition has many applications. Unfortunately, spoken alphabet recognition might not be a simple task due to highly confusable set of letters as presented in the English alphabets. The highly acoustic similarities that contribute to the confusability may hinder the accuracy of speech recognition systems. One of the confusable set is called the E-set letters which consist of the letters B, C, D, E, G, P, T, V and Z. In this study, we present aninvestigation of isolated alphabet speech recognition system using the Mel Frequency Cepstral Coefficients (MFCC) and Back-propagation Neural Network (BPNN) for the E-set and for all the 26 English alphabets. Learning rates and momentum rates of the BPNN are adjusted and varied in order to achieve the best recognition rate for the E-set and all the 26 alphabets. By adjusting these parameters,we managed to achieve 62. 28% and 70. 49% recognition rate for E-set recognition under speaker-independent and speaker-dependent conditions respectively.

References
  1. P. C. Loizou and A. S. Spanias, "High-Performance Alphabet Recognition," IEEE Transactions on Speech and Audio Processing, vol. 4, pp. 430-445, 1996.
  2. M. Karnjanadecha and S. A. Zahorian, "Signal Modeling for Isolated Word Recognition," presented at the Proceedings of the Acoustics, Speech, and Signal Processing (ICASSP), 1999.
  3. R. Cole, M. Fanty, Y. Muthusamy, and M. Gopalakrishnan, "Speaker-Independent Recognition of Spoken English Letters," in International Joint Conference on Neural Networks (IJCNN), 1990, pp. 45-51
  4. M. D. Ibrahim, A. M. Ahmad, D. F. Smaon, and M. S. H. Salam, "Improved E-set Recognition Performance using Time-Expanded Features," presented at the Second National Conference on Computer Graphics and Multimedia (CoGRAMM), Selangor, Malaysia, 2004.
  5. K. J. Lang, A. H. Waibel, and G. E. Hinton, "A Time-Delay Neural Network Architecture for Isolated Word Recognition," Neural Networks, vol. 3, pp. 23-43, 1990.
  6. R. F. Favero, "Compound Wavelets: Wavelets for Speech Recognition," in International Symposium on Time-Frequency and Time-Scale Analysis, 1994, pp. 600-603.
  7. M. Fanty and R. Cole, "Spoken Letter Recognition," presented at the Proceedings of the conference on Advances in neural information processing systems Denver, Colorado, United States, 1990.
  8. M. Karnjanadecha and S. A. Zahorian, "Signal Modeling for High-Performance Robust Isolated Word Recognition," IEEE Transactions on Speech and Audio Processing, vol. 9, pp. 647-654, 2001.
  9. M. E. Ayadi, M. S. Kamel, and F. Karray, "Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases," Pattern Recognition, vol. 44, pp. 572-587, 2011.
  10. J. W. Picone, "Signal Modeling Techniques in Speech Recognition," Proceedings of the IEEE, vol. 81, pp. 1215-1247, 1993.
  11. D. O'Shaughnessy, "Invited Paper: Automatic Speech Recognition: History, Methods and Challenges," Pattern Recognition, vol. 41, pp. 2965-2979, 2008.
  12. Z. Razak, N. J. Ibrahim, M. Y. I. Idris, E. M. Tamil, Z. M. Yusoff, and N. N. A. Rahman, "Quranic Verse Recitation Recognition Module for Support in j-QAF Learning: A Review," International Journal of Computer Science and Network Security (IJCSNS), vol. 8, pp. 207-216, August 2008.
  13. S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book: Microsoft Corporation, 2000.
  14. A. S. Pandya and R. B. Macy, Pattern Recognition with Neural Networks in C++. Florida: CRC Press, 1996.
  15. J. -S. R. Jang. Speech and Audio Toolbox. Available: http://mirlab. org/jang/matlab/toolbox/sap/
  16. M. S. H. Salam, D. Mohamad, and S. H. S. Salleh, "Temporal Speech Normalization Methods Comparison in Speech Recognition Using Neural Network," presented at the International Conference of Soft Computing and Pattern Recognition (SoCPaR), Melacca, Malaysia, 2009.
  17. M. S. H. Salam, D. Mohamad, and S. Salleh, "Malay Isolated Speech Recognition Using Neural Network: A Work in Finding Number of Hidden Nodes and Learning Parameters," The International Arab Journal of Information Technology, vol. 8, pp. 364-371, October 2011.
  18. K. Daqrouq, "Wavelet Entropy and Neural Network for Text-Independent Speaker Identification," Engineering Applications of Artificial Intelligence, vol. 24, pp. 796-802, 2011.
Index Terms

Computer Science
Information Sciences

Keywords

Mel-frequency Cepstral Coefficients Mfcc Error Back-propagation Neural Network E-set