A Review of Challenges in Automatic Speech Recognition

Harshalata Petkar

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

A Review of Challenges in Automatic Speech Recognition

by Harshalata Petkar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 151 - Number 3

Year of Publication: 2016

Authors: Harshalata Petkar

10.5120/ijca2016911706

Harshalata Petkar . A Review of Challenges in Automatic Speech Recognition. International Journal of Computer Applications. 151, 3 ( Oct 2016), 23-26. DOI=10.5120/ijca2016911706

@article{ 10.5120/ijca2016911706,

author = { Harshalata Petkar },

title = { A Review of Challenges in Automatic Speech Recognition },

journal = { International Journal of Computer Applications },

issue_date = { Oct 2016 },

volume = { 151 },

number = { 3 },

month = { Oct },

year = { 2016 },

issn = { 0975-8887 },

pages = { 23-26 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume151/number3/26214-2016911706/ },

doi = { 10.5120/ijca2016911706 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:56:07.605662+05:30

%A Harshalata Petkar

%T A Review of Challenges in Automatic Speech Recognition

%J International Journal of Computer Applications

%@ 0975-8887

%V 151

%N 3

%P 23-26

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech is the nature’s gift to the human being which contributes towards the intelligence and discrimination from rest of the animal kingdom. Taking into consideration technological aspects, speech recognition is the buzzword today, as communication and hands free computing evolving day by day. Speech is a very important mode of the communication and interaction with the digital computer. Speech recognition along with the wide range of applicability in domain of computer science, medical science, psychology, sports, neurology has many challenges while developing. Developing real time speech recognizer may hurdle from adverse environment to anatomy of the human body. It also involves linguistic aspects too. This paper explores various challenges in developing a robust ASR system.

References

John Makhoul and Richard Schwartz, “State of art in continuous speech recognition” proceeding National Academy of Science,USA Vol 92 pp9956-9963 october1995
H. Dudley, The Vocoder, Bell Labs Record, Vol. 17, 122-126, 1939.
Lawson, A.D., Harris, D.M., Grieco, J.J., 2003. Eﬀect of foreign accent on speech recognition in the NATO N-4 corpus. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 1505–1508.;
Vibha Tiwari, International Journal on Emerging Technologies 1(1): 19-22(2010) ISSN : 0975-8364 MFCC and its application in speaker recognition
Scan soft (2004). Embeded speech soloutions retrieved January 25, 2005 from http://www.speechworks.com/
Robertson, J., Wong, Y.T., Chung, C., and Kim, D.K., (1998) Automatic Speech Recognition for Generalised Time Based Media Retrieval and Indexing, Proceedings of the sixth ACM International Conference on Multimedia(pp 241-246) Bristol, England.
Huang, X., Acero, A., Hon, H., 2001. Spoken Language Processing. Prentice-Hall, PTR, Upper Saddle River, NJ.
Multimodality in Language and Speech Systems Björn Granström, David House, and Inger Karlsson (Eds.). Text, speech and Language Technology, Dordrecht,(2002)
Article from url https://www.hamilton.edu/oralcommunication/spoken-language-vs-written-language
Garvin, P.L., Ladefoged, P., 1963. Speaker identiﬁcation and message identiﬁcation in speech recognition. Phonetica 9, 193–199. (Garvin and Ladefoged, 1963; Nolan, 1983)
Nolan, F., 1983. The Phonetic Bases of Speaker Recognition. Cambridge University Press, Cambridge
Kubala, F., Anastasakos, A., Makhoul, J., Nguyen, L., Schwartz, R., Zavaliagkos, E., 1994. Comparative experiments on large vocabulary 782 M. Benzeghiba et al. / Speech Communication 49 (2007) 763–786speech recognition. In: Proceedings of ICASSP, Adelaide, Australia,pp. 561–564
Van Compernolle, D., 2001. Recognizing speech of goats, wolves, sheep and ... non-natives. Speech Communication 35 (1–2), 71–79.
Lee, S., Potamianos, A., Narayanan, S., 1999. Acoustics of children speech: developmental changes of temporal and spectral parameters. The Journal of the Acoustical Society of America 105, 1455–1468.
Das, S., Nix, D., Picheny, M., 1998. Improvements in children speech recognition performance. In: Proceedings of ICASSP, vol. 1. Seattle, USA, pp. 433–436.
Lee, L., Rose, R.C., 1996. Speaker normalization using effcient frequency warping procedures. In: Proceedings of ICASSP, vol. 1. Atlanta, Georgia, pp. 353–356.
Martinez et al., 1997; Mirghafori et al., 1995; Siegler and Stern, 1995
RABINER, L.R., JUANG, B., Fundamentals on Speech Recognition, New Jersey, Prentice Hall, 1996.
HUANG, X., ACERO, A., HON, H.W., Spoken Language Processing: A Guide to Theory, Algorithm and System Development, New Jersey, Prentice Hall, chapter 11, 2001.
Linguistics: An introduction to language and communication
Louis Boves and Johan de Vethd. Comparison of channel normalization techniques for automatic speech recognition over the phone. In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, volume 4, pages 2332 {2335 vol.4, oct 1996.
Gang Liu and John L. Hansen. A systematic strategy for robust automatic dialect identi_ cation. In EUSIPCO2011, pages 2138{2141, 2011.Gang Liu, Yun Lei, and John H.L. Hansen. Dialect identi_ cation: Impact of di_erences between read versus spontenous speech. In EUSIPCO2010,pages 49{53, 2010.
J ohn Nerbonne. Linguistic variation and computation. In Proceedingsof the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1, EACL '03, pages 3{10, Stroudsburg, PA, USA, 2003. Association for Computational Linguistics.
Pedro A. Torres-Carrasquillo, Douglas A. Reynolds, and P. Gleason.Dialect identi_cation using gaussian mixture models. In ISCA, pages757{760, 2004.
Mingkuan Liu, Bo Xu, Taiyi Hunng, Yonggang Deng, and Chengrong Li. Mandarin accent adaptation based on contextindependent/context-dependent pronunciation modeling. In Proceedings of the Acoustics, Speech, and Signal Processing, ICASSP '00, pages II1025{II1028, Washington, DC, USA, 2000. IEEE Computer Society.

Index Terms

Computer Science

Information Sciences

Keywords

Speech Speech recognition communication linguistics