CFP last date
21 October 2024
Reseach Article

T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You)

by Varun Soni, Rizwan Shaikh, Sayantan Mahato, Shaikh Phiroj
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 9
Year of Publication: 2021
Authors: Varun Soni, Rizwan Shaikh, Sayantan Mahato, Shaikh Phiroj
10.5120/ijca2021921387

Varun Soni, Rizwan Shaikh, Sayantan Mahato, Shaikh Phiroj . T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You). International Journal of Computer Applications. 183, 9 ( Jun 2021), 1-6. DOI=10.5120/ijca2021921387

@article{ 10.5120/ijca2021921387,
author = { Varun Soni, Rizwan Shaikh, Sayantan Mahato, Shaikh Phiroj },
title = { T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You) },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2021 },
volume = { 183 },
number = { 9 },
month = { Jun },
year = { 2021 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume183/number9/31952-2021921387/ },
doi = { 10.5120/ijca2021921387 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:16:16.519178+05:30
%A Varun Soni
%A Rizwan Shaikh
%A Sayantan Mahato
%A Shaikh Phiroj
%T T.U.E.S.D.A.Y (Translation Using machine learning from English Speech to Devanagari Automated for You)
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 9
%P 1-6
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In today's globalized world, one thing that acts as a barrier to healthy information exchange is language. On top of that, with the onset of technologies such as YouTube and Facebook making it easy to share knowledge with people all around the world, language can impede that flow of information. If someone in India wants to access a piece of content in written form from another country, they can make use of services such as Google translate. However, the same cannot be extended for any piece of content which is rendered in an audio-visual medium since no such apparatus has been developed which can help people comprehend the content of that specific type. Specifically, students have experienced this first-hand when they try to access content from other universities but the medium of language is something that they are not wellversed in. With these issues in mind, this group is trying to build an automatic voice dubbing system: a speech-to-speech translation pipeline which can help users easily understand other users without the worry of language barrier. The model is called T.U.E.S.D.A.Y. (Translation Using machine learning for English Speech to Devanagari Automated for You) and is divided into three conversion modules: English speech to English text, English text to Devanagari text, and finally Devanagari text to Hindi speech. These modules work together in tandem to ensure an integrated model as a whole.

References
  1. Deepspeech - an open-source speech-to-text engine. https: //github.com/mozilla/DeepSpeech. Accessed: 2020-05- 13.
  2. Google colab - a python development environment that runs in the browser using google cloud. https://research. google.com/colaboratory/. Accessed: 2020-05-13.
  3. Keras - an open-source software library for artificial neural networks. https://keras.io/. Accessed: 2020-05-13.
  4. Tensorboard - tensorflow’s visualization toolkit. https:// www.tensorflow.org/tensorboard. Accessed: 2020-05- 13.
  5. Tensorflow - a free and open-source software library for machine learning. https://www.tensorflow.org/. Accessed: 2020-05-13.
  6. Yi-Hsiu Liao, Hung-yi Lee, and Lin-shan Lee. Towards structured deep neural network for automatic speech recognition. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pages 137–144. IEEE, 2015.
  7. Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, and Tie-Yan Liu. Fastspeech: Fast, robust and controllable text to speech, 2019.
  8. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, ukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’s neural machine translation system: Bridging the gap
Index Terms

Computer Science
Information Sciences

Keywords

Neural Networks TensorFlow DeepSpeech Keras FastSpeech