CFP last date
20 May 2024
Reseach Article

TDNN-LSTM-based Acoustic Modeling for verification of Qur’an Recitation for Non-Arabic Speakers using Kaldi Toolkit

by Nazik O’mar Balula, Mohsen Rashwan, Sherif Mahdi Abdo
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 28
Year of Publication: 2021
Authors: Nazik O’mar Balula, Mohsen Rashwan, Sherif Mahdi Abdo
10.5120/ijca2021921152

Nazik O’mar Balula, Mohsen Rashwan, Sherif Mahdi Abdo . TDNN-LSTM-based Acoustic Modeling for verification of Qur’an Recitation for Non-Arabic Speakers using Kaldi Toolkit. International Journal of Computer Applications. 183, 28 ( Sep 2021), 31-40. DOI=10.5120/ijca2021921152

@article{ 10.5120/ijca2021921152,
author = { Nazik O’mar Balula, Mohsen Rashwan, Sherif Mahdi Abdo },
title = { TDNN-LSTM-based Acoustic Modeling for verification of Qur’an Recitation for Non-Arabic Speakers using Kaldi Toolkit },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2021 },
volume = { 183 },
number = { 28 },
month = { Sep },
year = { 2021 },
issn = { 0975-8887 },
pages = { 31-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume183/number28/32109-2021921152/ },
doi = { 10.5120/ijca2021921152 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:18:11.036149+05:30
%A Nazik O’mar Balula
%A Mohsen Rashwan
%A Sherif Mahdi Abdo
%T TDNN-LSTM-based Acoustic Modeling for verification of Qur’an Recitation for Non-Arabic Speakers using Kaldi Toolkit
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 28
%P 31-40
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Automatic Speech Recognition (ASR) has become an important component in HCI (Human -Computer Interaction) such as learning and processing natural languages. This paper provides a hybrid system which used GMM-HMM (Hidden Markov Model with a Mixture of Gaussians Model) and TDNN-LSTM (Time Delay Neural Network with Long-Short Term Memory Neural Network) to detect and correct the pronunciation errors in Quran recitation for non-Arabic speakers, specifically Indian speakers. The develo-  , † , X ,   ,   ,  , ¨ ¨, ) that non-Arabic speakers can not pronounce them correctly and may confused with other letters that share the same articulation point. Traing and Testing data collected from 94 Indian speakers. MFCCs had been used as a feature extraction technique whereas GMMHMM and TDNN-LSTM used as recognition tool. The main contribuation of the system is the enhancement and increament of accuracy of the HAFSS© system by using Deep Neural Network instead of GMM-HMM. The open-source Kaldi ASR toolkit recipes were used for building, training, testing and evaluation of the system. The developed system outperforms the GMM-MM model by 1.56% based on Kaldi toolkit word accuracy equation. The SUD () letter accuracy using DNN-HMM model based on Kaldi toolkit outperforms the GMM-HMM model by 1% and at the same time outperforms DNN-HMM model based on HTK toolkit by 9.5%. The system acuracy was 95.14% using GMM-HMM and 96.88% using TDNN-LSTM. Calculating the accuracy of the 10 letters, the best accuracy was 97.3% which achived by the letter TTA (  ), and the worest accuracy was 90.1% which achived by the letter DAA (X ). The rest of the paper is divided into seven parts, Section 1, Introduction. Section 2 Qur’an recitation problems introduced along with Previous and Related studies. Section 3 outlines the Project Goal and Section 4 explains the structure of the system. The acoustic model training is explained in Section 5. Section 6 shows the Experiments Results and discussion. Models Results comparison is presented in Section 7. Comparison with previously published results is explaind in Section 8. Recomindations and conclusion in Section 9.

References
  1. Creating the language model or grammar. https:// kaldi-asr.org/doc/data_prep.html. Accessed : 2020- 11-22.
  2. The dummy’s guide to mfcc. https://medium.com/ prathena/the-dummys-guide-to-mfcc-aceab2450fd . Accessed : 2020-09-25.
  3. A gentle introduction to the rectified linear unit (relu). 2019. https:// machinelearningmastery.com/rectifiedlinear/ activationfunctionfordeeplearningneural-networks . Accessed : 08/20/2020.
  4. Mohamed S Abdo, AH Kandil, AM El-Bialy, and Sahar A Fawzy. Automatic detection for some common pronunciation mistakes applied to chosen quran sounds. In 2010 5th Cairo International Biomedical Engineering Conference, pages 219–222. IEEE, 2010.
  5. Sherif Mahdy Abdou, Salah Eldeen Hamid, Mohsen Rashwan, Abdurrahman Samir, Ossama Abdel-Hamid, Mostafa Shahin, and Waleed Nazih. Computer aided pronunciation learning system using speech recognition techniques. In Ninth International Conference on Spoken Language Processing, 2006.
  6. Ayat Hafzalla Ahmed and Sherif Mahdi Abdo. Verification system for quran recitation recordings. International Journal of Computer Applications, 163(4) :6–11, 2017.
  7. Alaa N Akkila and Samy S Abu-Naser. Rules of tajweed the holy quran intelligent tutoring system. 2018.
  8. Ahmed AbdulQader Al-Bakeri and Abdullah Ahmad Basuhail. Asr for tajweed rules : Integrated with self-learning environments. International Journal of Information Engineering & Electronic Business, 9(6), 2017.
  9. Mubarak Al-Marri, Hazem Raafat, Mustafa Abdallah, Sherif Abdou, and Mohsen Rashwan. Computer aided qur’an pronunciation using dnn. Journal of Intelligent & Fuzzy Systems, 34(5) :3257–3271, 2018.
  10. Tareq Altalmas, Muhammad Ammar Jamil, Salmiah Ahmad, Wahju Sediono, Momoh Jimoh E Salami, Surur Shahbudin Hassan, and Abdul Halim Embong. Lips tracking identification of a correct quranic letters pronunciation for tajweed teaching and learning. IIUM Engineering Journal, 18(1) :177– 191, 2017.
  11. S Hamid and Mohsen Rashwan. Automatic generation of hypotheses for automatic diagnosis of pronunciation errors. In Proceedings of NEMLAR International Conference on Arabic Language Resources and Tools, pages 135–139, 2004.
  12. Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition : The shared views of four research groups. IEEE Signal processing magazine, 29(6) :82–97, 2012.
  13. Noor Jamaliah Ibrahim, Mohd Yamani Idna Idris, Zaidi Razak, and Noor Naemah Abdul Rahman. Automated tajweed checking rules engine for quranic learning. Multicultural Education & Technology Journal, 2013.
  14. Noor Jamaliah Ibrahim, Mohd Yamani Idna Idris, and Zulkifli Mohd Yusoff. Computer aided pronunciation learning for al-jabari method : A review. QURANICA-International Journal of Quranic Research, 6(2) :51–68, 2014.
  15. Ahsiah Ismail, Mohd Yamani Idna Idris, Noorzaily Mohamed Noor, Zaidi Razak, and Zulkifli Mohd Yusoff. Mfcc-vq approach for qalqalahtajweed rule checking. Malaysian Journal of Computer Science, 27(4) :275–293, 2014.
  16. Dan Jurafsky. Speech & language processing. Pearson Education India, 2000.
  17. SEHM Metwalli. Computer aided pronunciation learning system using statistical based automatic speech recognition techniques. Faculty of Engineering at Cairo University in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in ELECTRONICS AND COMMUNICATION ENGINEERING FACULTY OF ENGINEERING, CAIRO UNIVERSITY GIZA, 2005.
  18. Abdel-rahman Mohamed, George Dahl, and Geoffrey Hinton. Deep belief networks for phone recognition. In Nips workshop on deep learning for speech recognition and related applications, volume 1, page 39. Vancouver, Canada, 2009.
  19. Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. arXiv preprint arXiv :1003.4083, 2010.
  20. Budiman Putra, B Atmaja, and D Prananto. Developing speech recognition system for quranic verse recitation learning software. IJID (International Journal on Informatics for Development), 1(2) :14–21, 2012.
  21. Zaidi Razak, Noor Jamaliah Ibrahim, Mohd Yamani Idna Idris, Emran Mohd Tamil, Mohd Yakub, Zulkifli Mohd Yusoff, and Noor Naemah Abdul Rahman. Quranic verse recitation recognition module for support in j-qaf learning : A review. International Journal of Computer Science and Network Security (IJCSNS), 8(8) :207–216, 2008.
  22. Anna Vigdís Rúnarsdóttir. Re-scoring word lattices from automatic speech recognition system based on manual error corrections. PhD thesis, 2018.
  23. Ha¸sim Sak, Andrew Senior, and Françoise Beaufays. Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv preprint arXiv :1402.1128, 2014.
  24. MA Sherif, A Samir, AH Khalil, and R Mohsen. Enhancing usability of capl system for quran recitation learning. INTERSPEECH, 2007.
  25. Thomas Hain Steve Young, M.J.F. Gales and Xunying Liu. The HTK Book (version 3.5a). Cambridge University Engineering Department, 2015.
  26. R Stuckless. Developments in real-time speech-to-text communication for people with impaired hearing. Communication access for people with hearing loss, pages 197–226, 1994.
  27. Hassan Tabbal, W El Falou, and B Monla. Analysis and implementation of a" quranic" verses delimitation system in audio files using speech recognition techniques. In 2006 2nd International Conference on Information & Communication Technologies, volume 2, pages 2979–2984. IEEE, 2006.
  28. AN Wahidah, MS Suriazalmi, ML Niza, H Rosyati, N Faradila, A Hasan, AK Rohana, and ZN Farizan. Makhraj recognition using speech processing. In 2012 7th International Conference on Computing and Convergence Technology (ICCCT), pages 689–693. IEEE, 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Automatic Speech Recognition Deep Neural network Kaldi toolkit Time Delay Neural Network Quranic Recitation Problems