Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You)

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2021
Authors:
Varun Soni, Rizwan Shaikh, Sayantan Mahato, Shaikh Phiroj
10.5120/ijca2021921387

Varun Soni, Rizwan Shaikh, Sayantan Mahato and Shaikh Phiroj. T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You). International Journal of Computer Applications 183(9):1-6, June 2021. BibTeX

@article{10.5120/ijca2021921387,
	author = {Varun Soni and Rizwan Shaikh and Sayantan Mahato and Shaikh Phiroj},
	title = {T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You)},
	journal = {International Journal of Computer Applications},
	issue_date = {June 2021},
	volume = {183},
	number = {9},
	month = {Jun},
	year = {2021},
	issn = {0975-8887},
	pages = {1-6},
	numpages = {6},
	url = {http://www.ijcaonline.org/archives/volume183/number9/31952-2021921387},
	doi = {10.5120/ijca2021921387},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

In today's globalized world, one thing that acts as a barrier to healthy information exchange is language. On top of that, with the onset of technologies such as YouTube and Facebook making it easy to share knowledge with people all around the world, language can impede that flow of information. If someone in India wants to access a piece of content in written form from another country, they can make use of services such as Google translate. However, the same cannot be extended for any piece of content which is rendered in an audio-visual medium since no such apparatus has been developed which can help people comprehend the content of that specific type. Specifically, students have experienced this first-hand when they try to access content from other universities but the medium of language is something that they are not wellversed in. With these issues in mind, this group is trying to build an automatic voice dubbing system: a speech-to-speech translation pipeline which can help users easily understand other users without the worry of language barrier. The model is called T.U.E.S.D.A.Y. (Translation Using machine learning for English Speech to Devanagari Automated for You) and is divided into three conversion modules: English speech to English text, English text to Devanagari text, and finally Devanagari text to Hindi speech. These modules work together in tandem to ensure an integrated model as a whole.

References

  1. Deepspeech - an open-source speech-to-text engine. https: //github.com/mozilla/DeepSpeech. Accessed: 2020-05- 13.
  2. Google colab - a python development environment that runs in the browser using google cloud. https://research. google.com/colaboratory/. Accessed: 2020-05-13.
  3. Keras - an open-source software library for artificial neural networks. https://keras.io/. Accessed: 2020-05-13.
  4. Tensorboard - tensorflow’s visualization toolkit. https:// www.tensorflow.org/tensorboard. Accessed: 2020-05- 13.
  5. Tensorflow - a free and open-source software library for machine learning. https://www.tensorflow.org/. Accessed: 2020-05-13.
  6. Yi-Hsiu Liao, Hung-yi Lee, and Lin-shan Lee. Towards structured deep neural network for automatic speech recognition. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 137–144. IEEE, 2015.
  7. Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. Fastspeech: Fast, robust and controllable text to speech, 2019.
  8. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, ukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.

Keywords

Neural Networks, TensorFlow, DeepSpeech, Keras, FastSpeech