Call for Paper - August 2022 Edition
IJCA solicits original research papers for the August 2022 Edition. Last date of manuscript submission is July 20, 2022. Read More

CRF based Part of Speech Tagger for Domain Specific Hindi Corpus

IJCA Proceedings on National Conference on Contemporary Computing
© 2017 by IJCA Journal
NCCC 2016 - Number 2
Year of Publication: 2017
Vaishali Gupta
Nisheeth Joshi
Iti Mathur

Vaishali Gupta, Nisheeth Joshi and Iti Mathur. Article: CRF based Part of Speech Tagger for Domain Specific Hindi Corpus. IJCA Proceedings on National Conference on Contemporary Computing NCCC 2016(2):14-18, April 2017. Full text available. BibTeX

	author = {Vaishali Gupta and Nisheeth Joshi and Iti Mathur},
	title = {Article: CRF based Part of Speech Tagger for Domain Specific Hindi Corpus},
	journal = {IJCA Proceedings on National Conference on Contemporary Computing},
	year = {2017},
	volume = {NCCC 2016},
	number = {2},
	pages = {14-18},
	month = {April},
	note = {Full text available}


Natural language processing (NLP) is a field of artificial intelligence and computational linguistics which is concerned with the interactions between human (natural) languages and computers. As known, NLP is related to the area of human–computer interaction. There are various phases involves in Natural language processing. POS Tagging is one of the necessary phases in NLP. Part of Speech Tagger is an important tool that is used to develop language translator and information extraction. The problem of tagging in natural language processing is to find a way to tag (annotate) each and every word in a sentence. This study presents a part of speech tagger (POS Tagger) for domain specific Hindi Language. The evaluation of the system is done on the Agricultural domain of Hindi Corpus using Conditional Random Field model.


  • Megyesi, Beáta. "Brill's rule-based PoS tagger".
  • Akshar Bharati Bharati, A. , Chaitanya V. , Sangal R. , (1995) "Natural Language Processing – A Paninian Perspective". Prentice-Hall India, New Delhi (1995).
  • Joshi Nisheeth, Hemant Darbari, and Iti Mathur. 2013. HMM based POS tagger for Hindi. Proceeding of 2013 International Conference on Artificial Intelligence, Soft Computing (AISC-2013).
  • R. Akilan, E. R. Naganathan. 2012. POS Tagging for Classical Tamil Texts. International Journal of business Intelligent. Volume 01. No. 01. June 2012. pp 27-30.
  • Jyoti Singh, Nisheeth Joshi, Iti Mathur. 2013. Marathi Part of Speech Tagger Using Supervised Learning. In proceeding of International Conference on Advanced Computing, Networking and Informatics, India. June 2013. pp 251-257.
  • Shrivastava, Manish, and Pushpak Bhattacharyya. 2008. Hindi POS tagger using naive stemming: harnessing morphological information without extensive Linguistic knowledge. In International Conference on NLP (ICON08), Pune, India.
  • Singh, J. , Joshi, N. , & Mathur, I. 2013. Development of Marathi part of speech tagger using statistical approach. In Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on (pp. 1554-1559). IEEE.
  • Dalal, Aniket, Kumar Nagaraj, U. Swant, Sandeep Shelke, and Pushpak Bhattacharyya. 2007. Building feature rich pos tagger for morphologically rich languages: Experience in Hindi. ICON (2007).
  • Naz, Fareena, Waqas Anwar, Usama Ijaz Bajwa, and Ehsan Ullah Munir. 2012. Urdu part of speech tagging using transformation based error driven learning. World Applied Sciences Journal 16, no. 3: 437-448.
  • Toutanova, Kristina, and Christopher D. Manning. 2000. Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics-Volume 13, pp. 63-70. Association for Computational Linguistics.
  • Gimpel, Kevin, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for twitter: Annotation, features, and experiments. " In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pp. 42-47. Association for Computational Linguistics.
  • Gupta, Vaishali, Nisheeth Joshi, and Iti Mathur. "POS tagger for Urdu using Stochastic approaches. " Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies. ACM, 2016.