CFP last date
22 July 2024
Reseach Article

CRF based Part of Speech Tagger for Domain Specific Hindi Corpus

Published on April 2017 by Vaishali Gupta, Nisheeth Joshi, Iti Mathur
National Conference on Contemporary Computing
Foundation of Computer Science USA
NCCC2016 - Number 2
April 2017
Authors: Vaishali Gupta, Nisheeth Joshi, Iti Mathur
849062d8-3ac7-4e29-979f-bddba0eff574

Vaishali Gupta, Nisheeth Joshi, Iti Mathur . CRF based Part of Speech Tagger for Domain Specific Hindi Corpus. National Conference on Contemporary Computing. NCCC2016, 2 (April 2017), 14-18.

@article{
author = { Vaishali Gupta, Nisheeth Joshi, Iti Mathur },
title = { CRF based Part of Speech Tagger for Domain Specific Hindi Corpus },
journal = { National Conference on Contemporary Computing },
issue_date = { April 2017 },
volume = { NCCC2016 },
number = { 2 },
month = { April },
year = { 2017 },
issn = 0975-8887,
pages = { 14-18 },
numpages = 5,
url = { /proceedings/nccc2016/number2/27343-6348/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Contemporary Computing
%A Vaishali Gupta
%A Nisheeth Joshi
%A Iti Mathur
%T CRF based Part of Speech Tagger for Domain Specific Hindi Corpus
%J National Conference on Contemporary Computing
%@ 0975-8887
%V NCCC2016
%N 2
%P 14-18
%D 2017
%I International Journal of Computer Applications
Abstract

Natural language processing (NLP) is a field of artificial intelligence and computational linguistics which is concerned with the interactions between human (natural) languages and computers. As known, NLP is related to the area of human–computer interaction. There are various phases involves in Natural language processing. POS Tagging is one of the necessary phases in NLP. Part of Speech Tagger is an important tool that is used to develop language translator and information extraction. The problem of tagging in natural language processing is to find a way to tag (annotate) each and every word in a sentence. This study presents a part of speech tagger (POS Tagger) for domain specific Hindi Language. The evaluation of the system is done on the Agricultural domain of Hindi Corpus using Conditional Random Field model.

References
  1. Megyesi, Beáta. "Brill's rule-based PoS tagger".
  2. Akshar Bharati Bharati, A. , Chaitanya V. , Sangal R. , (1995) "Natural Language Processing – A Paninian Perspective". Prentice-Hall India, New Delhi (1995).
  3. Joshi Nisheeth, Hemant Darbari, and Iti Mathur. 2013. HMM based POS tagger for Hindi. Proceeding of 2013 International Conference on Artificial Intelligence, Soft Computing (AISC-2013).
  4. R. Akilan, E. R. Naganathan. 2012. POS Tagging for Classical Tamil Texts. International Journal of business Intelligent. Volume 01. No. 01. June 2012. pp 27-30.
  5. Jyoti Singh, Nisheeth Joshi, Iti Mathur. 2013. Marathi Part of Speech Tagger Using Supervised Learning. In proceeding of International Conference on Advanced Computing, Networking and Informatics, India. June 2013. pp 251-257.
  6. Shrivastava, Manish, and Pushpak Bhattacharyya. 2008. Hindi POS tagger using naive stemming: harnessing morphological information without extensive Linguistic knowledge. In International Conference on NLP (ICON08), Pune, India.
  7. Singh, J. , Joshi, N. , & Mathur, I. 2013. Development of Marathi part of speech tagger using statistical approach. In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on (pp. 1554-1559). IEEE.
  8. Dalal, Aniket, Kumar Nagaraj, U. Swant, Sandeep Shelke, and Pushpak Bhattacharyya. 2007. Building feature rich pos tagger for morphologically rich languages: Experience in Hindi. ICON (2007).
  9. Naz, Fareena, Waqas Anwar, Usama Ijaz Bajwa, and Ehsan Ullah Munir. 2012. Urdu part of speech tagging using transformation based error driven learning. World Applied Sciences Journal 16, no. 3: 437-448.
  10. Toutanova, Kristina, and Christopher D. Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13, pp. 63-70. Association for Computational Linguistics.
  11. Gimpel, Kevin, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for twitter: Annotation, features, and experiments. " In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pp. 42-47. Association for Computational Linguistics.
  12. Gupta, Vaishali, Nisheeth Joshi, and Iti Mathur. "POS tagger for Urdu using Stochastic approaches. " Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies. ACM, 2016.
Index Terms

Computer Science
Information Sciences

Keywords

Pos Tagger Corpus Crf Model File Ai Agriculture Domain.