Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments

Md. Mahfuzur Rahman; Sanjit Kumar Saha; Md. Zakir Hossain; Md. Babul Islam

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Uplink-Downlink LTE Multi Cell Capacity: A Performance Analysis in the Presence of ICI, Imperfect Channel Information and Reuse-1 Plan

March

2014

Linear Regression Model on Multiresolution Analysis for Texture Classification

June

2010

Trajectory based Recovery of Index Finger Articulated Pose during Palmar Grasp

July

2012

A Review on RF Field Exposure from Cellular Base Stations

October

2014

Reseach Article

Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments

by Md. Mahfuzur Rahman, Sanjit Kumar Saha, Md. Zakir Hossain, Md. Babul Islam

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 58 - Number 10

Year of Publication: 2012

Authors: Md. Mahfuzur Rahman, Sanjit Kumar Saha, Md. Zakir Hossain, Md. Babul Islam

10.5120/9316-3548

Md. Mahfuzur Rahman, Sanjit Kumar Saha, Md. Zakir Hossain, Md. Babul Islam . Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments. International Journal of Computer Applications. 58, 10 ( November 2012), 6-10. DOI=10.5120/9316-3548

@article{ 10.5120/9316-3548,

author = { Md. Mahfuzur Rahman, Sanjit Kumar Saha, Md. Zakir Hossain, Md. Babul Islam },

title = { Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments },

journal = { International Journal of Computer Applications },

issue_date = { November 2012 },

volume = { 58 },

number = { 10 },

month = { November },

year = { 2012 },

issn = { 0975-8887 },

pages = { 6-10 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume58/number10/9316-3548/ },

doi = { 10.5120/9316-3548 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:02:04.434156+05:30

%A Md. Mahfuzur Rahman

%A Sanjit Kumar Saha

%A Md. Zakir Hossain

%A Md. Babul Islam

%T Performance Evaluation of CMN for Mel-LPC based Speech Recognition in Different Noisy Environments

%J International Journal of Computer Applications

%@ 0975-8887

%V 58

%N 10

%P 6-10

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This study is intended to develop a noise robust distributed speech recognizer for real-world applications by employing Cepstral Mean Normalization (CMN) for robust feature extraction. The main focus of the work is to cope with different noisy environments. To realize this objective, Mel-LP based speech analysis has been used in speech coding on the linear frequency scale by applying a first-order all-pass filter instead of a unit delay. Mismatch between training and test phases is reduced through robust feature extraction by applying CMN on Mel-LP cepstral coefficients as an effort to reduce additive noise and channel distortion. The performance of the proposed system has been evaluated on test set A of Aurora-2 database which is a subset of TIDigits database contaminated by additive noises and channel effects. The experiment is conducted on four different noisy environments and the baseline performance, that is, for Mel-LPC the average word accuracy has found to be 59. 05%. By applying the CMN on Mel-LP cepstral coefficients, the performance has been improved to 68. 02%. It is found that CMN performs significantly better for different noisy environments.

References

Bateman, D. C. , et al. , 1992. Spectral contrast normalization and other techniques for speech recognition in noise. Proc. of ICASSP '92, I: 241-244.
Vaseghi, S. V. and B. P. Milner, 1993. Noise-adaptive hidden Markov models based on Wiener filters. Proc. of Eurospeech '93, II: 1023-1026.
Islam, M. B. , K. Yamamoto, H. Matsumoto, 2007. Mel-Wiener filter for Mel-LPC based speech recognition. IEICE Transactions on Information and Systems, E90-D (6): 935-942.
Boll, S. F. , 1979. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. , Speech and Signal Processing, 27(2): 113-120.
Lim, J. S. and A. V. Oppenheim, 1979. Enhancement and bandwidth compression of noisy speech. Proc. of the IEEE, 67(2): 1586-1604.
Lockwood, P. and J. Boudy, 1992. Experiments with a nonlinear spectral subtractor (nss), hidden Markov models and the projection or robust speech recognition in cars. Speech Commun. , 11(2-3): 215-228.
Agarwal, A. and Y. M. Cheng, 1999. Two-stage Mel-warped Wiener filter for robust speech recognition. Proc. of ASRU '99: 67-70.
Zhu, Q. and A. Alwan, 2002. The effect of additive noise on speech amplitude spectra: A Quantitative analysis. IEEE Signal Processing Letters, 9(9): 275-277.
Atal, B. , 1974. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55(6): 1304-1312.
Furui, S. , 1981. Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. , Speech and Signal Processing, ASSP-29: 254-272.
Mokbel, C. , et al. , 1984. Compensation of telephone line effects for robust speech recognition. Proc. of ICSLP '94: 987-990.
Gales, M. J. F. and S. J. Young, 1993a. HMM recognition in noise using parallel model combination. Proc. of Eurospeech '93, II: 837-840.
Gales, M. J. F. and S. J. Young, 1993b. Cepstral parameter compensation for HMM recognition in noise. Speech Communication, 12(3): 231-239.
Varga, A. P. and R. K. Moore, 1990. Hidden Markov model decomposition of speech and noise. Proc. of ICASSP '90, 2: 845-848.
Davis, S. and P. Mermelstein, 1980. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Processing, ASSP-28(4): 357-366.
Hermansky, H. , 1987. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. , 87(4): 17-29.
Virag, N. , 1995. Speech enhancement based on masking properties of the auditory system. Proc. ICASSP '95: 796-799.
Itakura, F. and S. Saito, 1968. Analysis synthesis telephony based upon the maximum likelihood method. Proc. of 6th International Congress on Acoustics, Tokyo: C-5-5, C17-20.
Atal, B. and M. Schroeder, 1968. Predictive coding of speech signals. Proc. of 6th International Congress on Acoustics, Tokyo: 21-28.
Makhoul, J. and L. Cosell, 1976. LPCW: An LPC vocoder with linear predictive warping. Proc. of ICASSP '76: 446-469.
Itahashi, S. and S. Yokoyama, 1987. A formant extraction method utilizing mel scale and equal loudness contour. Speech Transmission Lab. -Quarterly Progress and Status Report (Stockholm) (4): 17-29.
Rahim, M. G. and B. H. Juang, 1996. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition. IEEE Trans. on Speech and Audio Processing, 4(1): 19-30.
Strube, H. W. , 1980. Linear prediction on a warped frequency scale. J. Acoust. Soc. Am. , 68(4): 1071-1076.
Oppenheim, A. V. and D. H. Johnson, 1972. Discrete representation of signals. IEEE Proc. , 60(6): 681-691.
Matsumoto, H. , et al. , 1998. An efficient Mel-LPC analysis method for speech recognition", Proc. ICSLP '98: 1051-1054.
Zwicker, E. and E. Terhardt, 1980. Analytical expressions for critical band rate and critical bandwidth as a function. J. Acoust. Soc. Am. , 68: 1523-1525.
Lindsay, P. H. and D. A. Norman, 1977. Human information processing: An introduction to psychology. 2nd Edn. , Academic Press.
Nakagawa, S. , et al. , ed. , 2005. Spoken language systems. Ohmsha, Ltd. , Japan, ch. 7.
Markel, J. and A. Gray, 1976. Linear prediction of speech. Springer-Verlag.
Acero, A. and R. Stern, 1990. Environmental robustness in automatic speech recognition. Proc. of ICASSP '90: 849-852.
Hirsch, H. G. and D. Pearce, 2000. The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proc. ISCA ITRW ASR 2000: 181:188.
Leonard, R. G. , 1984. A database for speaker independent digit recognition. ICASSP84, 3: 42. 11.
ITU recommendation G. 712, 1996. Transmission performance characteristics of pulse code modulation channels.

Index Terms

Computer Science

Information Sciences

Keywords

Mel-LPC bilinear transformation CMN Aurora 2 database