CFP last date
22 April 2024
Reseach Article

Improvement of the Results of Statistical Machine Translation System using Anusaaraka

by Shubhamay Sen, Sriram Chaudhury
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 85 - Number 14
Year of Publication: 2014
Authors: Shubhamay Sen, Sriram Chaudhury
10.5120/14913-3521

Shubhamay Sen, Sriram Chaudhury . Improvement of the Results of Statistical Machine Translation System using Anusaaraka. International Journal of Computer Applications. 85, 14 ( January 2014), 41-47. DOI=10.5120/14913-3521

@article{ 10.5120/14913-3521,
author = { Shubhamay Sen, Sriram Chaudhury },
title = { Improvement of the Results of Statistical Machine Translation System using Anusaaraka },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 85 },
number = { 14 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 41-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume85/number14/14913-3521/ },
doi = { 10.5120/14913-3521 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:02:29.745740+05:30
%A Shubhamay Sen
%A Sriram Chaudhury
%T Improvement of the Results of Statistical Machine Translation System using Anusaaraka
%J International Journal of Computer Applications
%@ 0975-8887
%V 85
%N 14
%P 41-47
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper describes an efficient experimental approach for the improvement of translation quality of phrase based statistical machine translation system by utilizing the insights of the rule based machine translation. As the most primitive step it is believed that appending large and accurately designed linguistic resources such as multiword bilingual dictionaries to the existing training corpus contributes a lot towards the enhancement of phrase alignment quality and phrase coverage of the Statistical Machine Translation (SMT) system. Further improvement in translation coverage can be achieved by improving the dictionary by introducing morph-syntactic word forms of the foreign language words instead of simple root word forms, and its corresponding translations in native language. As in real time testing scenario, the test corpus may possess different morphological extensions of the root word which is not covered by standard dictionaries. As a matter of fact addition of such dictionaries to the corpus enriches it and provides a solution to the improper translations previously generated due to occurrences of morph-syntactic extensions instead of the root word form. As the proposed approach towards further improvement, the intelligence of Anusaaraka and huge computational ability of SMT is integrated to achieve better translations. Anusaaraka is a machine translation system based on Panini's Astadhyayi grammatical rules and an expert when the English-Hindi phrase alignment is concerned. It does it by comparing its output translation with the accurate manual translation and extracting out the best possible option. The bi-lingual phrase pairs thus obtained are highly accurate and when appended to the training corpus of statistical machine translation system results as better phrase alignment structure, hence better translation quality.

References
  1. Anusaaraka, 2013. http://anusaaraka. iiit. ac. in/.
  2. Bharati, Chaitanya, Sangal, 2000. Natural Language Processing: A Paninian Perspective, pp. 193. http://anusaaraka. iiit. ac. in/node/65
  3. Bharati, Kulkarni, 2009. Anusaaraka: An Accessor cum Machine Translator. At: First Workshop on Free Rule Based MT, Alacante, Spain, 2nd November. Available FTP: http://sanskrit. uohyd. ernet. in/faculty/amba/PUBLICATIONS/presentation_spain. pdf.
  4. Burch, Talbot, Osborne, 2004. Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL '04), Article No. 175.
  5. Cer, Galley, Jurafsky, Manning, 2010. Phrasal: A Toolkit for Statistical Machine Translation with Facilities for Extraction and Incorporation of Arbitrary Model Features. In: Proceedings of NAACL Demo Session, Los Angeles, USA.
  6. Chaudhury, Rao, Sharma, 2010a. Anusaaraka: An Expert system based MT System. In: Proceedings of IEEE conference on Natural language processing and knowledge management (IEEE-NLP KE 2010), Beijing, China.
  7. Chaudhury, Sharma, Kulkarni, 2010b. Anusaaraka: An Approach to Machine Translation. In: Proceedings of the International Conference on "Language, Society and Culture in Asian Context". MSU, Thailand, 6-7 January.
  8. Forcada, Bonev, Rojas, Ortiz, Sa´nchez, Marti´nez, Oller, Montava, Tyers, 2009. Documentation of the Open-Source Shallow-Transfer Machine Translation platform Apertium. In: Technical report, Departament de Llenguatges i Sistemes Inform`atics, Universitat d'Alacant, Alicante, Spain, 10th March.
  9. Knight, 1999. A Statistical MT Tutorial Workbook, In: JHU summer workshop, 30th April. Available FTP: www. isi. edu/natural-language/mt/wkbk. rtf?.
  10. Knight, Koehn, 2003. What's New in Statistical Machine Translation, Information Sciences Institute, University of Southern California. Available FTP: homepages. inf. ed. ac. uk/pkoehn/publications/tutorial2003. pdf?.
  11. Koehn, 2013a. Tutorial on Statistical MT. http://www. statmt. org/book/.
  12. Koehn, 2013b. Moses: Statistical Machine Translation System User Manual and Code Guide. University of Edinburgh. 26th August. Available FTP: http://www. statmt. org/moses/manual/manual. pdf?.
  13. Koehn, Hoang, Birch, Burch, Federico, Bertoldi, Cowan, Shen, Moran, Zens, Dyer, Bojar, Constantin, Herbst, 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In: Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, June, pp. 177–180.
  14. Koehn, Och, Marcu, 2003. Statistical Phrase-Based Translation. In: Proceedings of HLT-NAACL, Edmonton, Canada, May-June, Main Papers, pp. 48-54.
  15. Kulkarni, 2003. Design and Architecture of `Anusaaraka'- An Approach to Machine Translation. In: Satyam Techical Review, vol 1, Q4, pp. 57-64, April.
  16. Nießen, 2002. Improving Statistical Machine Translation using Morph-syntactic Information. In: Ph. D. thesis, Department of Computer Science, RWTH Aachen University, Aachen, Germany, December.
  17. Och, 2002. Statistical Machine Translation: From Single-Word Models to Alignment Templates. In: Ph. D. Dissertation, Department of Computer Science, RWTH Aachen University, Aachen, Germany, October.
  18. Och, Ney, 2000. A Comparison of Alignment Models for Statistical Machine Translation. In: Proceedings of the 18th conference on Computational linguistics (COLING '00), Saarbrücken, Germany, July, pp. 1086-1090.
  19. Och, Ney, 2004. The Alignment Template Approach to Statistical Machine Translation. In: Computational Linguistics, Volume 30, Number 4, June.
  20. Papineni, Roukos, Ward, Zhu, 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July, pp. 311-318.
  21. Vogel, Monson, 2003. Augmenting Manual Dictionaries for Statistical Machine Translation Systems. In: Proceedings of Language Resources and Evaluation (LREC), pp. 1593-1596.
Index Terms

Computer Science
Information Sciences

Keywords

Statistical machine translation Bilingual dictionary Morphological dictionary Anusaaraka Phrasal Phrase alignment.