Call for Paper - September 2022 Edition
IJCA solicits original research papers for the September 2022 Edition. Last date of manuscript submission is August 22, 2022. Read More

Telephony Speech Recognition System: Challenges

Print
PDF
IJCA Proceedings on National Conference on Communication Technologies & its impact on Next Generation Computing 2012
© 2012 by IJCA Journal
CTNGC - Number 1
Year of Publication: 2012
Authors:
Joyanta Basu
Rajib Roy
Milton S. Bepari
Soma Khan

Joyanta Basu, Rajib Roy, Milton S Bepari and Soma Khan. Article: Telephony Speech Recognition System: Challenges. IJCA Proceedings on National Conference on Communication Technologies & its impact on Next Generation Computing 2012 CTNGC(1):30-36, November 2012. Full text available. BibTeX

@article{key:article,
	author = {Joyanta Basu and Rajib Roy and Milton S. Bepari and Soma Khan},
	title = {Article: Telephony Speech Recognition System: Challenges},
	journal = {IJCA Proceedings on National Conference on Communication Technologies & its impact on Next Generation Computing 2012},
	year = {2012},
	volume = {CTNGC},
	number = {1},
	pages = {30-36},
	month = {November},
	note = {Full text available}
}

Abstract

Present paper describes the challenges to design the telephony Automatic Speech Recognition (ASR) System. Telephonic speech data are collected automatically from all geographical regions of West Bengal to cover major dialectal variations of Bangla spoken language. All incoming calls are handled by Asterisk Server i. e. Computer telephony interface (CTI). The system asks some queries and users' spoken responses are stored and transcribed manually for ASR system training. In real time scenario, the telephonic speech contains channel drop, silence or no speech event, truncated speech signal, noisy signal etc along with the desired speech event. This paper describes these kinds of challenges of telephony ASR system. And also describes some brief techniques which will handle such unwanted signals in case of telephonic speech to certain extent and able to provide almost desired speech signal for the ASR system.

References

  • Kwan Min Lee, Jennifer Lai, "Speech vs. Touch: A Comparative Study of the Use of Speech and DTMF Keypad for Navigation", International Journal of Human Computer Interaction IJHCI, Vol. 19, No. 3, 2005.
  • Gomillion D, Dempster B, "Building Telephony System with Asterisk", ISBN: 1-904811-15-9, Packet Publishing Ltd.
  • Meggelen J V, Madsen L, Smith J, "Asterisk: The Future of Telephony", ISBN-10: 0-596-51048-9, ISBN-13: 987-0-596-51048-0, O'REILL
  • Yiu-Kei Lau; Chok-Ki Chan; , "Speech recognition based on zero crossing rate and energy," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 33, no. 1, pp. 320- 323, Feb 1985.
  • Aye, Y. Y. ; , "Speech Recognition Using Zero-Crossing Features," Electronic Computer Technology, 2009 International Conference on , vol. , no. , pp. 689-692, 20-22 Feb. 2009.
  • Swee, T. T. ; Salleh, S. H. S. ; Jamaludin, M. R. ;, "Speech pitch detection using short-time energy," Computer and Communication Engineering (ICCCE), 2010 International Conference on, vol. , no. , pp. 1-6, 11-12 May 2010.
  • Erdol, N. ; Castelluccia, C. ; Zilouchian, A. ; "Recovery of missing speech packets using the short-time energy and zero-crossing measurements," Speech and Audio Processing, IEEE Transactions on, vol. 1, no. 3, pp. 295-303, Jul 1993. http://www. speech. cs. cmu. edu/.
  • Joyanta Basu, Soma Khan, Rajib Roy and Milton Samirakshma Bepari, "Designing Voice Enabled Railway Travel Enquiry System: An IVR Based Approach on Bangla ASR", ICON 2011, Anna University, Chennai, India, pp – 138-145, December, 2011.
  • Guoyu Zuo; Wenju Liu; Xiaogang Ruan; "Telephone speech recognition using simulated data from clean database," Robotics, Intelligent Systems and Signal Processing, 2003. Proceedings. 2003 IEEE International Conference on, vol. 1, no. , pp. 49- 53 vol. 1, 8-13 Oct. 2003.
  • Joyanta Basu, Milton Samirakshma Bepari, Rajib Roy and Soma Khan, "Design of Telephonic Speech Data Collection and Transcription Methodology for Speech Recognition Systems", FRSM 2012, pp- 147-153, KIIT, Gurgaon.
  • Basu, J, Basu T, Mitra M, Das Mandal S, "Grapheme to Phoneme (G2P) conversion for Bangla," Oriental COCOSDA International Conference, pp. 66-71, 10-12 Aug. 2009.