CFP last date
20 June 2024
Reseach Article

Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

by A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 2
Year of Publication: 2012
Authors: A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman
10.5120/6877-9090

A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman . Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation. International Journal of Computer Applications. 46, 2 ( May 2012), 45-51. DOI=10.5120/6877-9090

@article{ 10.5120/6877-9090,
author = { A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman },
title = { Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 2 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 45-51 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number2/6877-9267/ },
doi = { 10.5120/6877-9090 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:38:44.992147+05:30
%A A. S. M Mahmudul Hasan
%A Saria Islam
%A M. Arifur Rahman
%T Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 2
%P 45-51
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Smoothing techniques adjust the maximum likelihood estimate of probabilities to produce more accurate probabilities. This is one of the most important tasks while building a language model with a limited number of training data. Our main contribution of this paper is to analyze the performance of different smoothing techniques on n-grams. Here we considered three most widely-used smoothing algorithms for language modeling: Witten-Bell smoothing, Kneser-Ney smoothing, and Modified Kneser-Ney smoothing. For the evaluation we use BLEU (Bilingual Evaluation Understudy) and NIST (National Institute of Standards and Technology) scoring techniques. A detailed evaluation of these models is performed by comparing the automatically produced word alignment. We use Moses Statistical Machine Translation System for our work (i.e.Moses decoder, GIZA++, mkcls, SRILM, IRSTLM, Pharaoh, BLEU Scoring Tool). Here machine translation approach has been tested on German to English and English to German task. Our obtain results are significantly better than those obtained with alternative approaches to machine translation. This paper addresses several aspects of Statistical Machine Translation (SMT). The emphasis is put on the architecture and modeling of an SMT system.

References
  1. Machine Translation, Wikipedia, en.wikipedia.org/wiki/Machine_translation, [last access: 06-04-2012].
  2. F.J. Och, and H. Ney (2004), “The alignment template approach to statistical machine translation”, Computational Linguistics, Vol. 30, no 4,.
  3. Ye-Yi Wang and Alex Waibel, (1997) “Decoding Algorithm in Statistical Machine Translation”.
  4. Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, (2001) “IBM Research Report Bleu: a Method for Automatic Evaluation of Machine Translation”, RC22176 (W0109-022).
  5. Enrique Alfonseca and Diana Perez, (2004) “Automatic Assessment of Open Ended Questions with a BLEU-inspired Algorithm and shallow NLP”.
  6. Josep M. Crego Clemente, (2008) “Architecture and Modeling for N-gram-based Statistical Machine Translation”.
  7. Pharaoh, www.isi.edu/licensed-sw/pharaoh/, [last access: 06-04-2012].
  8. Philipp Koehn, (2009) “Statistical Machine Translation System User Manual and Code Guide”, University of Edinburgh.
  9. K. A. Papineni, S. Roukos, T. Ward, W. J. Zhu, (2001) “BLEU: a method for automatic evaluation of machine translation”. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY.
  10. Timothy C. Bell, John G. Cleary, Ian H. Witten, (1990) "Text Compression" Prentice Hall.
  11. Kneser R. and Hermann Ney. (1995) "Improved backing-off for m-gram language modeling". In Proceedings of ICASSP-95, vol. 1, 181–184.
  12. Stanly F. Chan and Josua Goodman (1998), "An Emperical Study of Smoothing technique for Language Modeling", Computer Science group, Harvard University, Cambridge, Massachusetts.
  13. Bayes, Thomas, and Price, Richard (1763). "An Essay towards solving a Problem in the Doctrine of Chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S.". Philosophical Transactions of the Royal Society of London 53 (0): 370–418.
Index Terms

Computer Science
Information Sciences

Keywords

Machine Translation SMT Smoothing n-gram Parallel Corpora BLEU NIST