Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Investigating Speech Attribute Features for Anti-Phone based Pronunciation Verification Approach

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2021
Authors:
Ayat Hafzalla Ahmed, Hager Morsy, Sherif Mahdi Abdo
10.5120/ijca2021921465

Ayat Hafzalla Ahmed, Hager Morsy and Sherif Mahdi Abdo. Investigating Speech Attribute Features for Anti-Phone based Pronunciation Verification Approach. International Journal of Computer Applications 183(19):1-10, August 2021. BibTeX

@article{10.5120/ijca2021921465,
	author = {Ayat Hafzalla Ahmed and Hager Morsy and Sherif Mahdi Abdo},
	title = {Investigating Speech Attribute Features for Anti-Phone based Pronunciation Verification Approach},
	journal = {International Journal of Computer Applications},
	issue_date = {August 2021},
	volume = {183},
	number = {19},
	month = {Aug},
	year = {2021},
	issn = {0975-8887},
	pages = {1-10},
	numpages = {10},
	url = {http://www.ijcaonline.org/archives/volume183/number19/32030-2021921465},
	doi = {10.5120/ijca2021921465},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

With increased computing power, there has been a renewed interest in computer-assisted pronunciation learning (CAPL) applications in recent years;

Automatic accurate pronunciation verification method plays an important role in automating the learning process and increasing its quality.

Pronunciation errors can be divided into phonemic and prosodic error types. In this paper we propose a phoneme-level pronunciation verification method for Quranic Arabic based on anti-phone model. For each phoneme a binary support vector machine (SVM) classifier is trained to distinguish each phoneme from other phonemes. The (SVM) classifier is trained using speech attribute features derived from a bank of speech attribute detectors, namely manners and places of articulation. The feed forward deep neural network (DNN) architecture is utilized for the speech attribute detectors. The system is evaluated against speech corpora collected from fluent Quran reciters and achieved phoneme-level false-acceptance and false-rejection rates ranging from 2% to 25%.

References

  1. A. R. A. Mahmoud, "The Role of e-Learning Software in Teaching Quran Recitation," in Advances in Information Technology for the Holy Quran and Its Sciences (32519),
  2. A. Neri, O. Mich, M. Gerosa, and D. Giuliani, "The effectiveness of computer assisted pronunciation training for foreign language learning by children," Computer Assisted Language Learning, vol. 21, pp. 393-408, 2008.
  3. Olatunji, S. et al ," I dentification of Question and Non-Question Segments in Arabic Monology Based on Prosodic Features Using Type-2 Fuzzy Logic System,", Second International conference on Computational Intelligence, Modelling and Simulation, IEEE computer Society, pp: 149-153 (2010).
  4. H. Franco, L. Neumeyer, Y. Kim, and O. Ronen, "Automatic pronunciation scoring for language instruction," in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, 1997, pp. 1471-1474.
  5. Witt, S.M., Use of speech recognition in Computerassisted Language Learning, PhD thesis, Department of Engineering, University of Cambridge, 1999.
  6. Witt, S.M. and Young, S., “Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning", Speech Communication 30, 95-108, 2000.
  7. S. Kanters, C. Cucchiarini, and H. Strik, "The goodness of pronunciation algorithm: a detailed performance study," 2009.
  8. T. Pellegrini, L. Fontan, J. Mauclair, J. Farinas, and M. Robert, "The goodness of pronunciation algorithm applied to disordered speech," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
  9. W. Hu, Y. Qian, F. K. Soong, and Y. Wang, "Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers," Speech Communication, vol. 67, pp. 154-166, 2015.
  10. H. Franco, L. Ferrer, and H. Bratt, "Adaptive and discriminative modeling for improved mispronunciation detection," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, 2014, pp. 7709-7713.
  11. S. Wei, G. Hu, Y. Hu, and R.-H. Wang, "A new method for mispronunciation detection using support vector machine based on pronunciation space models," Speech Communication, vol. 51, pp. 896-905, 2009.
  12. A. Ito, Y.-L. Lim, M. Suzuki, and S. Makino, "Pronunciation error detection method based on error rule clustering using a decision tree," in Ninth European Conference on Speech Communication and Technology, 2005.
  13. H. Ryu and M. Chung, "Mispronunciation Diagnosis of L2 English at Articulatory Level Using Articulatory Goodness-Of-Pronunciation Features," in Proc. 7th ISCA Workshop on Speech and Language Technology in Education, pp. 65-70.
  14. Truong, K., Neri, A., De Wet, F., Cucchiarini, C., and Strik, H., "Automatic detection of frequent pronunciation errors made by L2-learners", Proceedings of Interspeech, 1345-1348, 2005.
  15. H. Strik, K. P. Truong, F. d. Wet, and C. Cucchiarini, "Comparing classifiers for pronunciation error detection," in Eighth Annual Conference of the International Speech Communication Association, 2007.
  16. T. Zhao, A. Hoshino, M. Suzuki, N. Minematsu and K. Hirose, "Automatic Chinese pronunciation error detection using SVM trained with structural features," 2012 IEEE Spoken Language Technology Workshop (SLT), 2012, pp. 473-478, doi: 10.1109/SLT.2012.6424270.
  17. A. M. Harrison, W.-K. Lo, X.-j. Qian, and H. Meng, "Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training," in International Workshop on Speech and Language Technology in Education, 2009.
  18. M. Shahin, B. Ahmed, A. Parnandi, V. Karappa, J. McKechnie, K. J. Ballard, et al., "Tabby talks: An automated tool for the assessment of childhood apraxia of speech," Speech Communication, vol. 70, pp. 49-64, 2015.
  19. W. Li, S. M. Siniscalchi, N. F. Chen, and C.-H. Lee, "Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling," in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, 2016, pp. 6135-6139.
  20. E. Mourtaga, A. Sharieh, and M. Abdallah, "Speaker independent Quranic recognizer based on maximum likelihood linear regression," in Proceedings of world academy of science, engineering and technology, 2007, pp. 61-67.
  21. B. Abro, A. B. Naqvi, and A. Hussain, "Qur'an recognition for the purpose of memorisation using Speech Recognition technique," in Multitopic Conference (INMIC), 2012 15th International, 2012, pp. 30-34.
  22. S. M. Abdou, S. E. Hamid, M. Rashwan, A. Samir, O. Abdel-Hamid, M. Shahin, et al., "Computer aided pronunciation learning system using speech recognition techniques," in Ninth International Conference on Spoken Language Processing, 2006.
  23. M. S. Elaraby, M. Abdallah, S. Abdou, and M. Rashwan, "A Deep Neural Networks (DNN) Based Models for a Computer Aided Pronunciation Learning System," in International Conference on Speech and Computer, 2016, pp. 51-58.
  24. C.-H. Lee, M. A. Clements, S. Dusan, E. Fosler-Lussier, K. Johnson, B.-H. Juang, et al., "An overview on automatic speech attribute transcription (ASAT)," in Eighth Annual Conference of the International Speech Communication Association, 2007.
  25. C.-H. Lee, "From knowledge-ignorant to knowledge-rich modeling: A new speech research paradigm for next generation automatic speech recognition," in Proc. ICSLP, 2004.
  26. V. Hautamäki, S. M. Siniscalchi, H. Behravan, V. M. Salerno, and I. Kukanov, "Boosting universal speech attributes classification with deep neural network for foreign accent characterization," in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
  27. S. Zhang, W. Guo, and G. Hu, "Exploring universal speech attributes for speaker verification," in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on, 2017, pp. 5355-5359.
  28. Y. Wang, J. Du, L. Dai, and C.-H. Lee, "A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors," in Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on, 2014, pp. 158-162.
  29. H. Ryu, H. Hong, S. Kim, and M. Chung, "Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection," in Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific, 2016, pp. 1-6.
  30. M. Shahin, B. Ahmed, and J. Ji, "One-Class SVMs Based Pronunciation Verification Approach," presented at the the 24th International Conference on Pattern Recognition (ICPR 2018), 2018.
  31. H. Hammady, O. Badawy, S. Abdou, and M. Rashwan, "An HMM system for recognizing articulation features for Arabic phones," in Computer Engineering & Systems, 2008. ICCES 2008. International Conference on, 2008, pp. 125-130.
  32. R. Ziedan, M. Micheal, A. Alsammak, M. Mursi, and A. Elmaghraby, "A Unified Approach for Arabic Language Dialect Detection," in 29th International Conference on Computers Applications in Industry and Engineering (CAINE 2016), Denver, USA, 2016.
  33. Every Ayah. Available: http://everyayah.com/
  34. A. Ragheb, "Quran Phonology; Quran reciting rules based on modern acoustics," M.Sc Thesis Cairo University, 2004.
  35. S. Hamid, "Computer aided pronunciation learning system using statistical based automatic speech recognition," PhD thesis, Cairo University, Cairo, Egypt, 2005.
  36. A.-r. Mohamed, G. Hinton, and G. Penn, "Understanding how deep belief networks perform acoustic modelling," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, 2012, pp. 4273-4276.
  37. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014.
  38. L. v. d. Maaten and G. Hinton, "Visualizing data using t-SNE," Journal of machine learning research, vol. 9, pp. 2579-2605, 2008.

Keywords

Speech attributes, pronunciation verification, anti-phone model, Quranic Arabic