Notification: Our email services are now fully restored after a brief, temporary outage caused by a denial-of-service (DoS) attack. If you sent an email on Dec 6 and haven't received a response, please resend your email.
CFP last date
20 December 2024
Reseach Article

Review of Stochastic POS Tagging Techniques used in Bengali

by Abul Kalam Md. Rajib Hasan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 102 - Number 8
Year of Publication: 2014
Authors: Abul Kalam Md. Rajib Hasan
10.5120/17838-8724

Abul Kalam Md. Rajib Hasan . Review of Stochastic POS Tagging Techniques used in Bengali. International Journal of Computer Applications. 102, 8 ( September 2014), 35-39. DOI=10.5120/17838-8724

@article{ 10.5120/17838-8724,
author = { Abul Kalam Md. Rajib Hasan },
title = { Review of Stochastic POS Tagging Techniques used in Bengali },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 102 },
number = { 8 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 35-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume102/number8/17838-8724/ },
doi = { 10.5120/17838-8724 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:32:36.755240+05:30
%A Abul Kalam Md. Rajib Hasan
%T Review of Stochastic POS Tagging Techniques used in Bengali
%J International Journal of Computer Applications
%@ 0975-8887
%V 102
%N 8
%P 35-39
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, we describe different stochastic methods or techniques used for POS tagging of Bengali language. We have shown a generalized stochastic model for POS tagging in Bengali. We reviewed kinds of corpus and number of tags used for tagging methods. In the study it is found that as many as 45 useful tags existed in the literature. There are four useful corpus found in the study. As Bengali is a morphologically rich language we outlined a feature list that could be used with different training algorithms. We found that a hybrid HMM model used with a morphological analyzer work best in Bengali with an accuracy of 96. 3%.

References
  1. NLTK 3. 0: Natural Language Toolkit. http://www. nltk. org/
  2. Antony P J, Dr. Soman K P: Parts Of Speech Tagging for Indian Languages: A Literature Survey in International Journal of Computer Applications (0975 – 8887) Volume 34– No. 8, November 2011.
  3. Dash, Niladri Sekhar ,"Part-of-speech (POS) Tagging of Bengali Written Text Corpus". Bhasa Bijnan o Prayukti: An International Journal on Linguistics and Language Technology. Vol. 1, No. 1, Jan-Jun 2013, Pp. 53-96.
  4. Asif Ekbal, Samiran Mandal and Sivaji Bandyopadhyay (2007), "Maximum Entropy Based Bengali Part of Speech Tagging", Workshop on shallow parsing in South Asian languages, shiva. iiit. ac. in/SPSAL2007/proceedings. php.
  5. Asif Ekbal, Samiran Mandal and Sivaji Bandyopadhyay (2007), "Bengali Part of Speech Tagging using Conditional Random Field", Workshop on shallow parsing in South Asian languages, shiva. iiit. ac. in/SPSAL2007/proceedings. php.
  6. Kamal Sarkar ,Arup Ratan Ghosh. A Memory Based POS Tagger for Bengali. http://www2. cse. iitk. ac. in/~iwml/2013/papers/116. pdf Accessed on 10/08/2014.
  7. The EMILLE/CIL Corpus http://catalog. elra. info/product_info. php?products_id=696
  8. Kalika Bali, Monojit Choudhury, Priyanka Biswas. Indian Language Part-of-Speech Tagset: Hindi. https://catalog. ldc. upenn. edu/LDC2010T24 . Accessed on 08/08/2014.
  9. G. D. Forney, Jr. , "The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268–278, March 1973.
  10. S Dandapat, S Sarkar, A Basu : A Hybrid Model for Part-of-Speech Tagging and its Application to Bengali. International conference on computational intelligence, 169-172
  11. Ratnaparkhi, A. : A maximum entropy part-of - speech tagger. In: Proc. Of EMNLP'96. (1996)
  12. Lafferty, J. , McCallum, A. , and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of the 18th ICML'01, 282-289.
  13. Eric Brill, "A Simple Rule-Based Part-of-Speech Tagger", In Proceeding Of The Third Conference on Applied Natural Language Processing, Trento, Italy, 1992, pp. 152-155
Index Terms

Computer Science
Information Sciences

Keywords

Natural Language Processing (NLP) Machine Learning.