CFP last date
22 July 2024
Call for Paper
August Edition
IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 22 July 2024

Submit your paper
Know more
Reseach Article

Assamese to English Statistical Machine Translation Integrated with a Transliteration Module

by Pranjal Das, Kalyanee K. Baruah
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Number 5
Year of Publication: 2014
Authors: Pranjal Das, Kalyanee K. Baruah
10.5120/17522-8084

Pranjal Das, Kalyanee K. Baruah . Assamese to English Statistical Machine Translation Integrated with a Transliteration Module. International Journal of Computer Applications. 100, 5 ( August 2014), 20-24. DOI=10.5120/17522-8084

@article{ 10.5120/17522-8084,
author = { Pranjal Das, Kalyanee K. Baruah },
title = { Assamese to English Statistical Machine Translation Integrated with a Transliteration Module },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 100 },
number = { 5 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 20-24 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume100/number5/17522-8084/ },
doi = { 10.5120/17522-8084 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:29:11.254633+05:30
%A Pranjal Das
%A Kalyanee K. Baruah
%T Assamese to English Statistical Machine Translation Integrated with a Transliteration Module
%J International Journal of Computer Applications
%@ 0975-8887
%V 100
%N 5
%P 20-24
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, it is described how an Assamese sentence is translated to English using statistical machine translation. Statistical Machine Translation is the paradigm where translations from source to target language are based on statistical models. Moses is used as a platform for Statistical Machine Translation. GIZA++ is also used for word-alignment and IRSTLM for language model training. A Transliteration model is also integrated into the system to deal with out of vocabulary (OOV) words.

References
  1. Dr Shikhar Kr. Sarma et al, "Foundation and Structure of Developing an Assamese Wordnet", In Proceedings of5th International Conference of the Global WordNet Association.
  2. F. J. Och and H. Ney, "Improved statistical alignment models", In the Proceedings of ACL, 2000.
  3. Marian Olteanu et al, "Phramer: Open Source Statistical Phrase-Based Translator", In the Proceedings of the Workshop of Statistical Machine Translation, June 2006, pp. 146-149.
  4. Peter F. Brown et al. , "A Statistical Approach to Machine Translation" Computational Linguistics Volume 16, Number 2, June 1990, pp. 79-85.
  5. Philipp Koehn et al, "Statistical Phrase-Based Translation", In the Proceedings of HLT-NAALC, May-June 2003, pp. 48-54.
  6. Philipp Koehn, "Pharaoh: A beam search decoder for phrase based statistical machine translation models", In the proceedings of AMTA, 2004.
  7. Philipp Koehn et al, "Moses: Open Source Toolkit for Statistical Machine Translation", In the Proceedings of the ACL, June 2007, pp. 177-180.
  8. Sanjay Kumar Dwivedi and Pramod Premdas Sukhadeve, "Machine Translation System in Indian Perspectives", Journal of Computer Science, Volume 6, Issue 10, pp. 1111-1116.
  9. Md. Zahurul Islam, "English to Bangla Statistical Machine Translation", Master Thesis, Universitat des Saarlendes, August 2009.
  10. Philipp Koehn, "Noun Phrase Translation", PhD Thesis, University of Southern California, 1993.
  11. "Machine Translation", Available: http://en. wikipedia. org/wiki/Machine_translation.
  12. "PSMT", Available: http://psmt. sourceforge. net/
  13. Statistical Machine Translation System User Manual and Code Guide", Available: http://www. statmt. org//moses/manual/manual. pdf.
  14. "The EGYPT Statistical Machine Translation Toolkit", Available: http://old-site. clsp. jhu. edu/ws99/projects/mt/toolkit/.
Index Terms

Computer Science
Information Sciences

Keywords

Assamese English Statistical Machine Translation Transliteration Corpus.