Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Nepali Speech Recognition using RNN-CTC Model

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2019
Paribesh Regmi, Arjun Dahal, Basanta Joshi

Paribesh Regmi, Arjun Dahal and Basanta Joshi. Nepali Speech Recognition using RNN-CTC Model. International Journal of Computer Applications 178(31):1-6, July 2019. BibTeX

	author = {Paribesh Regmi and Arjun Dahal and Basanta Joshi},
	title = {Nepali Speech Recognition using RNN-CTC Model},
	journal = {International Journal of Computer Applications},
	issue_date = {July 2019},
	volume = {178},
	number = {31},
	month = {Jul},
	year = {2019},
	issn = {0975-8887},
	pages = {1-6},
	numpages = {6},
	url = {},
	doi = {10.5120/ijca2019918401},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


This paper presents a Neural Network based Nepali Speech Recognition model. RNN (Recurrent Neural Networks) is used for processing sequential audio data. CTC (Connectionist Temporal Classification) [1] technique is applied allowing RNN to train over audio data. CTC is a probabilistic approach of maximizing the occurrence probability of the desired labels from RNN output. After processing through RNN and CTC layers, Nepali text is obtained as output. This paper also defines a character set of 67 Nepali characters required for transcription of Nepali speech to text.


  1. A. Graves, S. Fernandez, F. Gomez and J. Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented data with Recurrent Neural Networks. In ICML '06 Proc. of the Int. Conf. on International Conference on Machine Learning, Pittsburgh Pennsylvania USA
  2. E Hinton, Geoffrey & Osindero, Simon & Teh, Yee-Whye. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural computation, 18, pp. 1527-54.
  3. Bourlard, Herve A. and Morgan, Nelson. Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell, MA, USA, 1993.
  4. G. E. Dahl, D. Yu, L. Deng and A. Acero. 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition. In Proc. IEEE Transactions on Audio, Speech and Language Processing, 20, pp. 30-42.
  5. A. Graves and N. Jaitly. 2014. Towards End-to-End Speech Recognition with Recurrent Neural Networks. In ICML 14 Proc. of the Int. Conf. on International Conference on Machine Learning, Beijing China
  6. A. Kalakheti, K. P. Bhattarari, S. Kuwar and S. Adhikari, Automatic Speech Recognition for Nepali Language. Tribhuvan University, Nepal
  7. B. Joshi, A. Gajurel, A. Pokhrel and M. K. Sharma. 2017. HMM Based Isolated Word Nepali Speech Recognition. In Intern. Conf. on Machine Learning and Cybernetics. Ningbo, China.
  8. S. Hochreiter and J. Schmidhuber. 1997. Long Short-Term Memory. Neural Computation, 9(8), pp. 1735-1780
  9. Hochreiter, Sepp. (1998). The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 6. 107-116.
  10. M. Schuster and K. K. Paliwal. 1997. Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45.
  11. A. Graves, S. Fernandez and J. Schmidhuber. 2005. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the 2005 International Conference on Artificial Neural Networks. Warsaw, Poland.
  12. S. Magre, P. Janse, and R. Deshmukh. 2014. A Review on Feature Extraction and Noise Reduction Technique. International Journal of Advanced Research in Computer Science and Software Engineering
  13. The Python Tutorial,
  14. Tensorflow,


Artificial Intelligence, Machine Learning, Automatic Speech Recognition, Recurrent Neural Network, Connectionist Temporal Classification, Softmax, Hidden Markov Model, Nepali Speech Recognition, Long-Short Term Memory (LSTM), Backpropagation, Character Error Rate