Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

A. S. M Mahmudul Hasan; Saria Islam; M. Arifur Rahman

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2026

Submit your paper

Know more

The week's pick

REVENUE FORECASTING IN INTELLIGENT WATER MANAGEMENT SYSTEMS USING ARIMA TIME SERIES MODEL

Coraina Y. Torar Gloria Manggala Meilani J. Ngantung

Random Articles

Article:Info Hide ñ A Cluster Cover Approach

June

2010

Chain Multiplication of Dense Matrices: Proposing a Shared Memory based Parallel Algorithm

November

2012

Launching Email Spoofing Attacks

August

2010

Leveraging Big Data Analytics and Hadoop in Developing India’s Healthcare Services

March

2014

Reseach Article

Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

by A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 46 - Number 2

Year of Publication: 2012

Authors: A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman

10.5120/6877-9090

A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman . Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation. International Journal of Computer Applications. 46, 2 ( May 2012), 45-51. DOI=10.5120/6877-9090

@article{ 10.5120/6877-9090,

author = { A. S. M Mahmudul Hasan, Saria Islam, M. Arifur Rahman },

title = { Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation },

journal = { International Journal of Computer Applications },

issue_date = { May 2012 },

volume = { 46 },

number = { 2 },

month = { May },

year = { 2012 },

issn = { 0975-8887 },

pages = { 45-51 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume46/number2/6877-9267/ },

doi = { 10.5120/6877-9090 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:38:44.992147+05:30

%A A. S. M Mahmudul Hasan

%A Saria Islam

%A M. Arifur Rahman

%T Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

%J International Journal of Computer Applications

%@ 0975-8887

%V 46

%N 2

%P 45-51

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Smoothing techniques adjust the maximum likelihood estimate of probabilities to produce more accurate probabilities. This is one of the most important tasks while building a language model with a limited number of training data. Our main contribution of this paper is to analyze the performance of different smoothing techniques on n-grams. Here we considered three most widely-used smoothing algorithms for language modeling: Witten-Bell smoothing, Kneser-Ney smoothing, and Modified Kneser-Ney smoothing. For the evaluation we use BLEU (Bilingual Evaluation Understudy) and NIST (National Institute of Standards and Technology) scoring techniques. A detailed evaluation of these models is performed by comparing the automatically produced word alignment. We use Moses Statistical Machine Translation System for our work (i.e.Moses decoder, GIZA++, mkcls, SRILM, IRSTLM, Pharaoh, BLEU Scoring Tool). Here machine translation approach has been tested on German to English and English to German task. Our obtain results are significantly better than those obtained with alternative approaches to machine translation. This paper addresses several aspects of Statistical Machine Translation (SMT). The emphasis is put on the architecture and modeling of an SMT system.

References

Machine Translation, Wikipedia, en.wikipedia.org/wiki/Machine_translation, [last access: 06-04-2012].
F.J. Och, and H. Ney (2004), “The alignment template approach to statistical machine translation”, Computational Linguistics, Vol. 30, no 4,.
Ye-Yi Wang and Alex Waibel, (1997) “Decoding Algorithm in Statistical Machine Translation”.
Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, (2001) “IBM Research Report Bleu: a Method for Automatic Evaluation of Machine Translation”, RC22176 (W0109-022).
Enrique Alfonseca and Diana Perez, (2004) “Automatic Assessment of Open Ended Questions with a BLEU-inspired Algorithm and shallow NLP”.
Josep M. Crego Clemente, (2008) “Architecture and Modeling for N-gram-based Statistical Machine Translation”.
Pharaoh, www.isi.edu/licensed-sw/pharaoh/, [last access: 06-04-2012].
Philipp Koehn, (2009) “Statistical Machine Translation System User Manual and Code Guide”, University of Edinburgh.
K. A. Papineni, S. Roukos, T. Ward, W. J. Zhu, (2001) “BLEU: a method for automatic evaluation of machine translation”. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY.
Timothy C. Bell, John G. Cleary, Ian H. Witten, (1990) "Text Compression" Prentice Hall.
Kneser R. and Hermann Ney. (1995) "Improved backing-off for m-gram language modeling". In Proceedings of ICASSP-95, vol. 1, 181–184.
Stanly F. Chan and Josua Goodman (1998), "An Emperical Study of Smoothing technique for Language Modeling", Computer Science group, Harvard University, Cambridge, Massachusetts.
Bayes, Thomas, and Price, Richard (1763). "An Essay towards solving a Problem in the Doctrine of Chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M. A. and F. R. S.". Philosophical Transactions of the Royal Society of London 53 (0): 370–418.

Index Terms

Computer Science

Information Sciences

Keywords

Machine Translation SMT Smoothing n-gram Parallel Corpora BLEU NIST