CFP last date
20 May 2024
Reseach Article

Text Dependent Speaker Identification System using Discrete HMM in Noise

by Md. Rabiul Islam, Md. Fayzur Rahman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 21 - Number 3
Year of Publication: 2011
Authors: Md. Rabiul Islam, Md. Fayzur Rahman
10.5120/2494-3370

Md. Rabiul Islam, Md. Fayzur Rahman . Text Dependent Speaker Identification System using Discrete HMM in Noise. International Journal of Computer Applications. 21, 3 ( May 2011), 7-13. DOI=10.5120/2494-3370

@article{ 10.5120/2494-3370,
author = { Md. Rabiul Islam, Md. Fayzur Rahman },
title = { Text Dependent Speaker Identification System using Discrete HMM in Noise },
journal = { International Journal of Computer Applications },
issue_date = { May 2011 },
volume = { 21 },
number = { 3 },
month = { May },
year = { 2011 },
issn = { 0975-8887 },
pages = { 7-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume21/number3/2494-3370/ },
doi = { 10.5120/2494-3370 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:07:32.637013+05:30
%A Md. Rabiul Islam
%A Md. Fayzur Rahman
%T Text Dependent Speaker Identification System using Discrete HMM in Noise
%J International Journal of Computer Applications
%@ 0975-8887
%V 21
%N 3
%P 7-13
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, an improved strategy for automated text dependent speaker identification system has been proposed in noisy environment. The identification process incorporates the Hidden Markov Model technique with cepstral based features. To remove the background noise from the source utterance, wiener filter has been used. Different speech pre-processing techniques such as start-end point detection algorithm, pre-emphasis filtering, frame blocking and windowing have been used to process the speech utterances. RCC, MFCC, ΔMFCC, ΔΔMFCC, LPC and LPCC have been used to extract the features. After parameterization of the speech, Discrete Hidden Markov Model has been used in the learning and identification purposes. Features are extracted by using different techniques to optimize the performance of the identification. The performance of this identification is almost different in each case. The highest speaker identification rate of 93[%] for noiseless environment and 69.27[%] for noisy environment have been achieved in the close set text dependent speaker identification system.

References
  1. Jain, R. Bole, S. Pankanti, BIOMETRICS Personal Identification in Networked Society, Kluwer Academic Press, Boston, 1999.
  2. Rabiner, L., and Juang, B.-H., Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, New Jersey, 1993.
  3. Jacobsen, J. D., “Probabilistic Speech Detection”, Informatics and Mathematical Modeling, DTU, 2003.
  4. Jain, A., R.P.W.Duin, and J.Mao., “Statistical pattern recognition: a review”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 22, pp. 4–37, 2000..
  5. Davis, S., and Mermelstein, P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE 74 Transactions on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 28, No. 4, pp. 357-366, Aug. 1980.
  6. Sadaoki Furui, “50 Years of Progress in Speech and Speaker Recognition Research”, ECTI TRANSACTIONS ON COMPUTER AND INFORMATION TECHNOLOGY, Vol.1, No.2, Nov. 2005.
  7. Lockwood, P., Boudy, J., and Blanchet, M., “Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 265-268, Mar. 1992.
  8. Matsui, T., and Furui, S., “Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/ continuous HMMs”, IEEE Transactions on Speech Audio Process, No. 2, pp. 456-459, 1994.
  9. Reynolds, D.A., “Experimental evaluation of features for robust speaker identification”, IEEE Transactions on SAP, Vol. 2, pp. 639-643, 1994.
  10. Sharma, S., Ellis, D., Kajarekar, S., Jain, P. & Hermansky, H., “Feature extraction using non-linear transformation for robust speech recognition on the Aurora database.”, Proc. ICASSP2000, 2000.
  11. Wu, D., Morris, A.C. & Koreman, J., “MLP Internal Representation as Disciminant Features for Improved Speaker Recognition”, Proc. NOLISP2005, Barcelona, Spain, pp. 25-33, 2005.
  12. Konig, Y., Heck, L., Weintraub, M. & Sonmez, K., “Nonlinear discriminant feature extraction for robust text-independent speaker recognition”, Proc. RLA2C, ESCA workshop on Speaker Recognition and its Commercial and Forensic Applications, pp. 72-75, 1998.
  13. Ismail Shahin, “Improving Speaker Identification Performance Under the Shouted Talking Condition Using the Second-Order Hidden Markov Models”, EURASIP Journal on Applied Signal Processing, Vol. 4, pp. 482–486, Hindawi Publishing Corporation.
  14. S. E. Bou-Ghazale and J. H. L. Hansen, “A comparative study of traditional and newly proposed features for recognition of speech under stress”, IEEE Trans. Speech, and Audio Processing, Vol. 8, No. 4, pp. 429–442, 2000.
  15. G. Zhou, J. H. L. Hansen, and J. F. Kaiser, “Nonlinear feature based classification of speech under stress”, IEEE Trans. Speech, and Audio Processing, Vol. 9, No. 3, pp. 201–216, 2001.
  16. Simon Doclo and Marc Moonen, “On the Output SNR of the Speech-Distortion Weighted Multichannel Wiener Filter”, IEEE Signal Processing Letters, Vol. 12, No. 12, Dec. 2005.
  17. Wiener, N., Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applications, Wiely, Newyork, 1949.
  18. Wiener, N., Paley, R. E. A. C., “Fourier Transforms in the Complex Domains”, American Mathematical Society, Providence, RI, 1934.
  19. Koji Kitayama, Masataka Goto, Katunobu Itou and Tetsunori Kobayashi, “Speech Starter: Noise-Robust Endpoint Detection by Using Filled Pauses”, Eurospeech, Geneva, pp. 1237-1240, 2003.
  20. S. E. Bou-Ghazale and K. Assaleh, “A robust endpoint detection of speech for noisy environments with application to automatic speech recognition”, in Proc. ICASSP2002, Vol. 4, pp. 3808–3811, 2002.
  21. Martin, D. Charlet, and L. Mauuary, “Robust speech / non-speech detection using LDA applied to MFCC”, in Proc. ICASSP2001, Vol. 1, pp. 237–240, 2001.
  22. Richard. O. Duda, Peter E. Hart, David G. Strok, Pattern Classification, A Wiley-interscience publication, John Wiley & Sons, Inc, Second Edition, 2001.
  23. Sarma, V., Venugopal, D., “Studies on pattern recognition approach to voiced-unvoiced-silence classification”, Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP , Vol. 3, pp. 1-4. Apr. 1978.
  24. Qi Li. Jinsong Zheng, Augustine Tsai, Qiru Zhou, “Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition”, IEEE Transaction on speech and Audion Processing, Vol.10, No.3, March, 2002.
  25. Harrington, J., and Cassidy, S., Techniques in Speech Acoustics. Kluwer Academic Publishers, Dordrecht, 1999.
  26. Makhoul, J., “Linear prediction: a tutorial review”, Proceedings of the IEEE, Vol. 64, No. 4, pp. 561–580, 1975.
  27. Picone, J., “Signal modeling techniques in speech recognition”, Proceedings of the IEEE, Vol. 81, No. 9, pp. 1215–1247, 1993.
  28. Clsudio Beccchetti and Lucio Prina Ricotti, Speech Recognition Theory and C++ Implementation, John Wiley & Sons. Ltd., pp.124-136, 1999.
  29. L.P. Cordella, P. Foggia, C. Sansone, M. Vento., "A Real-Time Text-Independent Speaker Identification System", Proceedings of 12th International Conference on Image Analysis and Processing, IEEE Computer Society Press, Mantova, Italy, pp. 632 - 637 , Sep. 2003.
  30. J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals. Macmillan, 1993.
  31. F. Owens., Signal Processing Of Speech, Macmillan New electronics. Macmillan, 1993.
  32. F. Harris, “On the use of windows for harmonic analysis with the discrete fourier transform”, Proceedings of the IEEE 66, vol.1, pp.51-84, 1978.
  33. J. Proakis and D. Manolakis, Digital Signal Processing, Principles, Algorithms and Aplications, Second edition, Macmillan Publishing Company, New York, 1992.
  34. D. Kewley-Port and Y. Zheng, “Auditory models of formant frequency discrimination for isolated vowels”, Journal of the Acostical Society of America, Vol. 103, Issue 3, pp. 1654–1666, 1998.
  35. D. O’Shaughnessy, Speech Communication - Human and Machine, Addison Wesley, 1987.
  36. E. Zwicker., “Subdivision of the audible frequency band into critical bands (frequenzgruppen)”, Journal of the Acoustical Society of America, Vol. 33, pp. 248–260, 1961.
  37. S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics Speech and Signal Processing, Vol. 28, pp. 357–366, Aug. 1980.
  38. S. Furui., “Speaker independent isolated word recognition using dynamic features of the speech spectrum”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 3, pp. 52–59, Feb. 1986.
  39. S. Furui, “Speaker-Dependent-Feature Extraction, Recognition and Processing Techniques”, Speech Communication, Vol. 10, pp. 505-520, 1991.
  40. Huang X.D., Ariki Y., Jack M.A., Hidden Markov Models for Speech Recognition, Edinburgh. university Press, 1990.
  41. M. Hwang, X. Huang, "Shared-Distribution Hidden. Markov Models for Speech Recognition", IEEE. Trans. on. Speech and Audio Processing, vol. 1, No. 4, pp. 414-420, Apr. 1993.
  42. L.E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains", The Annals of Mathematical Statistics, Vol. 41, pp. 164-171, 1970.
  43. R.J.Elliott, L. Aggoun, and J.B. Moore, “Hidden Markov Models: Estimation and Control”, Applications of Mathematics: Stochastic Modeling and Applied Probability, Vol. 29, Springer, Berlin, 1997.
  44. L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of the IEEE, Vol. 77, No. 2, pp. 257–286, 1989.
  45. P. A. Devijver, "Baum's forward-backward algorithm revisited", Pattern Recognition Letter, Vol. 3, pp. 369-373, 1985.
  46. Hu, Y. and Loizou, P., “Subjective comparison of speech enhancement algorithms”, Proceedings of ICASSP-2006, I, pp. 153-156, Toulouse, France, May 2006.
  47. Hu, Y. and Loizou, P., “Evaluation of objective measures for speech enhancement”, Proceedings of INTERSPEECH-2006, Philadelphia, PA, September 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Noise Robust Speaker Identification Discrete Hidden Markov Model Speech Signal Processing Speech Feature Extraction.