Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

A. S. M Mahmudul Hasan; Saria Islam; M. Arifur Rahman

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Analysis of Approaches to Short Term Passenger Volume Prediction in Public Transport

December

2015

Encryption of Compressed MultiMedia Data

December

2012

AM FM based Prediction of Multiple Sclerosis in Brain MRI Images

September

2014

Fuzzy Quality Control with Reliability and Flexibility

August

2013

Reseach Article

Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

by A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 46 - Number 2

Year of Publication: 2012

Authors: A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman

10.5120/6877-9090

A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman . Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation. International Journal of Computer Applications. 46, 2 ( May 2012), 45-51. DOI=10.5120/6877-9090

@article{ 10.5120/6877-9090,

author = { A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman },

title = { Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation },

journal = { International Journal of Computer Applications },

issue_date = { May 2012 },

volume = { 46 },

number = { 2 },

month = { May },

year = { 2012 },

issn = { 0975-8887 },

pages = { 45-51 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume46/number2/6877-9267/ },

doi = { 10.5120/6877-9090 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:38:44.992147+05:30

%A A. S. M Mahmudul Hasan

%A Saria Islam

%A M. Arifur Rahman

%T Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

%J International Journal of Computer Applications

%@ 0975-8887

%V 46

%N 2

%P 45-51

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Smoothing techniques adjust the maximum likelihood estimate of probabilities to produce more accurate probabilities. This is one of the most important tasks while building a language model with a limited number of training data. Our main contribution of this paper is to analyze the performance of different smoothing techniques on n-grams. Here we considered three most widely-used smoothing algorithms for language modeling: Witten-Bell smoothing, Kneser-Ney smoothing, and Modified Kneser-Ney smoothing. For the evaluation we use BLEU (Bilingual Evaluation Understudy) and NIST (National Institute of Standards and Technology) scoring techniques. A detailed evaluation of these models is performed by comparing the automatically produced word alignment. We use Moses Statistical Machine Translation System for our work (i.e.Moses decoder, GIZA++, mkcls, SRILM, IRSTLM, Pharaoh, BLEU Scoring Tool). Here machine translation approach has been tested on German to English and English to German task. Our obtain results are significantly better than those obtained with alternative approaches to machine translation. This paper addresses several aspects of Statistical Machine Translation (SMT). The emphasis is put on the architecture and modeling of an SMT system.

References

Machine Translation, Wikipedia, en.wikipedia.org/wiki/Machine_translation, [last access: 06-04-2012].
F.J. Och, and H. Ney (2004), “The alignment template approach to statistical machine translation”, Computational Linguistics, Vol. 30, no 4,.
Ye-Yi Wang and Alex Waibel, (1997) “Decoding Algorithm in Statistical Machine Translation”.
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, (2001) “IBM Research Report Bleu: a Method for Automatic Evaluation of Machine Translation”, RC22176 (W0109-022).
Enrique Alfonseca and Diana Perez, (2004) “Automatic Assessment of Open Ended Questions with a BLEU-inspired Algorithm and shallow NLP”.
Josep M. Crego Clemente, (2008) “Architecture and Modeling for N-gram-based Statistical Machine Translation”.
Pharaoh, www.isi.edu/licensed-sw/pharaoh/, [last access: 06-04-2012].
Philipp Koehn, (2009) “Statistical Machine Translation System User Manual and Code Guide”, University of Edinburgh.
K. A. Papineni, S. Roukos, T. Ward, W. J. Zhu, (2001) “BLEU: a method for automatic evaluation of machine translation”. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY.
Timothy C. Bell, John G. Cleary, Ian H. Witten, (1990) "Text Compression" Prentice Hall.
Kneser R. and Hermann Ney. (1995) "Improved backing-off for m-gram language modeling". In Proceedings of ICASSP-95, vol. 1, 181–184.
Stanly F. Chan and Josua Goodman (1998), "An Emperical Study of Smoothing technique for Language Modeling", Computer Science group, Harvard University, Cambridge, Massachusetts.
Bayes, Thomas, and Price, Richard (1763). "An Essay towards solving a Problem in the Doctrine of Chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S.". Philosophical Transactions of the Royal Society of London 53 (0): 370–418.

Index Terms

Computer Science

Information Sciences

Keywords

Machine Translation SMT Smoothing n-gram Parallel Corpora BLEU NIST