CFP last date
20 May 2024
Reseach Article

Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration

by M. L. Dhore, S. K. Dixit, T. D. Sonwalkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 55 - Number 10
Year of Publication: 2012
Authors: M. L. Dhore, S. K. Dixit, T. D. Sonwalkar
10.5120/8791-2776

M. L. Dhore, S. K. Dixit, T. D. Sonwalkar . Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration. International Journal of Computer Applications. 55, 10 ( October 2012), 28-34. DOI=10.5120/8791-2776

@article{ 10.5120/8791-2776,
author = { M. L. Dhore, S. K. Dixit, T. D. Sonwalkar },
title = { Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration },
journal = { International Journal of Computer Applications },
issue_date = { October 2012 },
volume = { 55 },
number = { 10 },
month = { October },
year = { 2012 },
issn = { 0975-8887 },
pages = { 28-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume55/number10/8791-2776/ },
doi = { 10.5120/8791-2776 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:56:53.753958+05:30
%A M. L. Dhore
%A S. K. Dixit
%A T. D. Sonwalkar
%T Analyzing Probability Vectors for Named Entity Statistical Machine Transliteration
%J International Journal of Computer Applications
%@ 0975-8887
%V 55
%N 10
%P 28-34
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Machine transliteration systems are classified as either Rule-based methods or statistical methods. A rule-based method focuses on transliterating names using lots of human-made rules set. These systems are simple to implement but require huge amount of language expertise. In statistical methods, the importance is given in converting transliteration problem into a classification problem and employs a statistical model to solve this classification problem. Though these methods don't require expert knowledge of Language model, they need large amounts of bilingual data and good algorithm for training. Currently, basic Markov Chain Model (MM), Extended Markov Chain (EMC), Hidden Markov Model (HMM), Conditional Random Fields (CRF), Decision Tree (DT), Maximum Entropy Markov Model (MEMM) and Support Vector Machine (SVM) are the popular statistical approaches used by many researchers across the globe. This paper focuses on mathematical analysis of different statistical approaches used in machine transliteration of named entity which would be beneficial for many upcoming researchers to know the mathematics used behind the curtains.

References
  1. Mitchell, T. 1997. Machine Learning, McGraw Hill
  2. Christopher D. Manning, Hinrich Schutze. 1999. Foundations of Statistical Natural Language Processing, MIT Press
  3. Karimi S, Scholer F, and Turpin, 2011. Machine transliteration survey, ACM Computing Surveys, Vol. 43, No. 3, Article 17, pp. 1-46.
  4. Li Haizhou, Kumaran A, Vladimir Pervouchine and Min Zhang, 2009. Report of NEWS Machine Transliteration Shared Task
  5. L. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in Speech Recognition. Proceedings of IEEE, Vol 77, No. 2, pp. 257-296
  6. A. L. Berger, S. D. Pietra, and V. J. Della Pietra. 1996. A maximum entropy approach to natural language processing, Computational Linguistics, vol. 22, no. 1, pp. 39–71
  7. Nigam, K. , Lafferty, J. , & McCallum, A. 1999. Using maximum entropy for text classification. IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61–67
  8. Beeferman, D. , Berger, A. , & Lafferty, J. D. 1999. Statistical models for text segmentation. Machine Learning, 34, pp. 177–210.
  9. Ratnaparkhi, A. 1996. A maximum entropy model for part-of speech tagging. In E. Brill and K. Church (Eds. ), Proceedings of the conference on empirical methods in natural language processing, Somerset, New Jersey: Association for Computational Linguistics, pp. 133–142
  10. McCallum, A. , Freitag, D. , & Pereira, F. 2000. Maximum Entropy Markov models for information extraction and segmentation. Proceedings of ICML pp. 591–598
  11. Punyakanok, V. , and Roth, D. 2001. The use of classifiers in sequential inference. NIPS 13.
  12. Della Pietra, S. , Della Pietra, V. J. , & Lafferty, J. D. 1997. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, pp. 380–393.
  13. Lafferty, J. , McCallum, A. , & Pereira, F. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. ICML.
  14. Yasemin Altun, Thomas Hofmann, and Alexander J. Smola, 2004. Gaussian Process Classification for Segmenting and Annotating Sequences, Proceedings of the 21 st International Conference on Machine Learning, Canada
  15. Phil Blunsom, 2004. Hidden Markov Models
  16. Jong-Hoon Oh, Key-Sun Choi, and Hitoshi Isahara, 2006. A Machine Transliteration Model Based on Correspondence between Graphemes and Phonemes, ACM Transactions on Asian Language Information Processing, Vol. 5, No. 3, pp. 185–208.
  17. Kevin Knight, 2009. Bayesian Inference with Tears, a tutorial workbook for natural language researchers
  18. Kevin Knight, 2009. Training Finite-State Transducer Cascades with Carmel
  19. Y. Yuan and M J Shaw, 1995. Introduction of Fuzzy Decision Trees, Fuzzy sets and Systems, pp 125-139
  20. Sung Young Jung, Sung Lim Hong and Eunok Pack, 2000. An English to Korean transliteration model of Extended Markov Window, Proceeding COLING 2000 Proceedings of the 18th conference on Computational linguistics , Volume 1, pp 383-389.
  21. GuoDong Zhou and Jian Su, 2002. Named Entity Recognition using an HMM-based Chunk Tagger, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 473-480.
  22. Hanna M. Wallach, 2004. Conditional Random Fields: An introduction, University of Pennsylvania CIS Technical Report MS-CIS-04-21, pp. 1-9
  23. Charles Sutton and Andrew McCallum, An Introduction to conditional random fields for relational learning, University of Massachusetts, USA
  24. Sunita Sarawagi and WilliamW. Cohen, Semi-Markov Conditional Random Fields for Information Extraction, Indian Institute of Technology Bombay, India
Index Terms

Computer Science
Information Sciences

Keywords

Conditional Random Fields Decision Trees Hidden Markov Model Markov Chain Statistical Machine Transliteration