CFP last date
22 April 2024
Reseach Article

Analysis of a Small Vocabulary Bangla Speech Database for Recognition

by Sumana Huque, Ahsan Habib Rasel, M. Babul Islam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 133 - Number 6
Year of Publication: 2016
Authors: Sumana Huque, Ahsan Habib Rasel, M. Babul Islam
10.5120/ijca2016907827

Sumana Huque, Ahsan Habib Rasel, M. Babul Islam . Analysis of a Small Vocabulary Bangla Speech Database for Recognition. International Journal of Computer Applications. 133, 6 ( January 2016), 22-28. DOI=10.5120/ijca2016907827

@article{ 10.5120/ijca2016907827,
author = { Sumana Huque, Ahsan Habib Rasel, M. Babul Islam },
title = { Analysis of a Small Vocabulary Bangla Speech Database for Recognition },
journal = { International Journal of Computer Applications },
issue_date = { January 2016 },
volume = { 133 },
number = { 6 },
month = { January },
year = { 2016 },
issn = { 0975-8887 },
pages = { 22-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume133/number6/23791-2016907827/ },
doi = { 10.5120/ijca2016907827 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:30:25.607196+05:30
%A Sumana Huque
%A Ahsan Habib Rasel
%A M. Babul Islam
%T Analysis of a Small Vocabulary Bangla Speech Database for Recognition
%J International Journal of Computer Applications
%@ 0975-8887
%V 133
%N 6
%P 22-28
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

To carry out any kind of research in the field of speech signal processing, a standard database is essential. There are many databases in different languages but not in Bangla language. Therefore, in this article, it has been attempted to develop and analysis a small vocabulary Bangla database for recognition. In this database 11 Bangla digits (/ak/, /dui/, /tin/, /chaar/, /panch/, /chhoy/, /shaat/, /aat/, /noy/, /zero/, /shunno/) have been used. The developed database consisted of two sets of data such as training and testing datasets. The training dataset contains 3824 utterances of 50 speakers, and testing dataset is subdivided into four groups (clean1, clean2, clean3 and clean4) and contains 1985 utterances of 52 speakers. All recordings have been done in a quiet room but not sound proof with the A4Tech HS-60 headset microphone interfaced to an Intel Dual Core 2.0 GHz CPU. The software used to record and edit the speech file is wavepad. Finally, an HMM based recognizer is developed to evaluate the database. The word accuracy for test sets is found to be 98.05% on the average. In this recognition process Mel-LPC based front-end and as a reference recognizer HTK (Hidden Markov Model Toolkit) have been used.

References
  1. Muhammad, G. et al. 2009. Automatic speech recognition for Bangla Digits. IEEE, 12th International Conference on Computers and Information Technology (ICCIT '09), Dhaka.
  2. Hirsch, H. G. and D. Pearce, 2000. The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proc. ISCA ITRW ASR 2000: 181:188.
  3. E. T. S. Institute. 2000. Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms. ETSI Standard, vol. 1, 12, 2000-2004.
  4. Pearce, D. et al. 2000. The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Motorola Labs, UK.
  5. Nakamura, S. 2005. AURORA-2J: An Evaluation Framework for Japanese Noisy Speech Recognition. IEICE Transactions on Information and Systems. E88-D, 3, 535-544.
  6. Moreno, A. et al. 1998. SPEECH DAT CAR. A Large Speech Database For Automotive Environments. Universidad Politécnica de Cataluña, Barcelona, Spain.
  7. Young, S. et al. 1999. The HTK Book, USA: Microsoft Corporation.
  8. Weisstein, A. E. 2013. Hidden Markov Model Manual v1.0. Washington University and Truman State University.
  9. Weisstein’s,E. W. E. 2010. Wolfram math world. MathWorld Book.
  10. Mooney, R. J. 1997. Natural Language Processing: N-Gram Language Models. University of Texas at Austin, Texas, USA.
  11. Entropic, 2011. General Principles of Recognition. [Online].
  12. Islam, M. B. 2007. Mel-Wiener Filter for Mel-LPC Based Speech Recognition. IEICE Transactions on Information and System. 90, 6, 30-35.
  13. Rahman, M. and Islam, M. B. 2010. Performance evaluation of MLPC and MFCC for HMM based noisy speech recognition. International Conference on Computer and Information Technology (ICCIT), Dhaka.
  14. Matsumoto, H. et al. 1998. An efficient Mel-LPC analysis method for speech recognition. Proc. ICSLP, 98, 1051-1054.
  15. Furui, S. 1981. Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust., Speech and Signal Processing, ASSP-29, 254-272.
Index Terms

Computer Science
Information Sciences

Keywords

Bangla Speech Database Bangla Speech Recognition HMM Mel-LPC