CFP last date
20 September 2024
Reseach Article

An Improved Rule based Iterative Affix Stripping Stemmer for Tamil Language using K-Mean Clustering

by M. Kasthuri, S. Britto Ramesh Kumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 13
Year of Publication: 2014
Authors: M. Kasthuri, S. Britto Ramesh Kumar
10.5120/16406-6114

M. Kasthuri, S. Britto Ramesh Kumar . An Improved Rule based Iterative Affix Stripping Stemmer for Tamil Language using K-Mean Clustering. International Journal of Computer Applications. 94, 13 ( May 2014), 36-41. DOI=10.5120/16406-6114

@article{ 10.5120/16406-6114,
author = { M. Kasthuri, S. Britto Ramesh Kumar },
title = { An Improved Rule based Iterative Affix Stripping Stemmer for Tamil Language using K-Mean Clustering },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 13 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 36-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number13/16406-6114/ },
doi = { 10.5120/16406-6114 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:17:35.491937+05:30
%A M. Kasthuri
%A S. Britto Ramesh Kumar
%T An Improved Rule based Iterative Affix Stripping Stemmer for Tamil Language using K-Mean Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 13
%P 36-41
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Stemming is an important step in many of the Information Retrieval (IR) and Natural Language Processing (NLP) tasks. Stemming is usually done by removing any attached suffixes and prefixes (affixes) from index terms before the actual assignment of the term to the index. Stemming is a pre-processing step in Text Mining applications and basic requirement for many areas such as computational linguistics and information retrieval work for improving their recall performance. This paper proposes improved rule based iterative affix stripping algorithm for getting stemmed Tamil word with less computational steps. Further K-Means clustering algorithm utilized to cluster the stemmed Tamil Words in order to improve the performance of Tamil language Information Retrieval and Extraction. The experimental analysis clearly shows that the words stemmed after clustering gives better result compared to words stemmed before clustering.

References
  1. A. Ramanathan and D. Rao, "A Lightweight Stemmer for Hindi", in proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics(EACL) on Computational linguistics for South Asian Language (Budapest, April) workshop, 2003.
  2. Juhi Ameta, Nisheeth Joshi and Iti Mathur, 2011, "A Lightweight Stemmer for Gujarati", 46th Annual National Convention of Computer Society of India. Organized by Computer Society of India Gujarat Chapter. Sponsored by Computer Society of India and Department of Science and Technology, Govt. of Gujarat and IEEE Gujarat Section.
  3. Vishal Gupta Gurpreet Singh Lehal, "Punjabi Language Stemmer for nouns and proper names", Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, pages 35–39,Chiang Mai, Thailand, November 8, 2011.
  4. Khan, "A light weight stemmer for Bengali and its Use in spelling Checker", Proc. 1st Intl. Conf. on Digital Comm. and Computer Applications (DCCA07), Irbid, Jordan, March 19-23, 2007.
  5. Sajjad Ahmad Khan1, Waqas Anwar1, Usama Ijaz Bajwa1, Xuan Wang, "A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language", Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), pages 69–78, COLING 2012, Mumbai, December 2012.
  6. Mudassar M. Majgaonker et al. , "Discovering suffixes: A Case Study for Marathi Language", (IJCSE) International Journal on Computer Science and Engineering Vol. 02, No. 08, 2010, 2716-2720.
  7. Malayalam Stemmer - Computational Linguistic Research Group, nlp. au- kbc. org, Malayalam Stemmer.
  8. Madhavi Ganapathiraju and Levin Lori, TelMore: "Morphological Generator for Telugu Nouns and verbs", Second International Conference on Universal Digital Library Alexandria, Egypt, November 17-19, 2006.
  9. Frakes, William B. and Christopher J. Fox. , "Strength and similarity of affix removal stemming algorithms", ACM, SIGIR Forum 37 (2003): 26-30.
  10. D. Freitag, "Morphology induction from term clusters", Ninth conference on computational natural language learning (CoNLL), pp. 128–135, 2005.
  11. Imed Al-Sughaiyer, Ibrahim Al-Kharashi, "Arabic morphological analysis techniques: a comprehensive survey", Journal of the American Society for Information Science and Technology, 55(3):189 – 213, 2004.
  12. M. F. Porter. 1980. An algorithm for suffix stripping Program, 14(3): 130–137.
  13. R. C. Dubes and A. K. Jain. , "Algorithms for Clustering Data". Prentice Hall, 1988.
  14. Ramachandran, Vivek Anandan and Krishnamurthi, Ilango, "An Iterative Suffix Stripping Tamil Stemmer", Proceedings of the International Conference on Information Systems Design and Intelligent Applications: Volume 132, 583-590, 2012.
  15. M. Thangarasu and Dr. R. Manavalan, "Stemmers for Tamil Language: Performance Analyses", International Journal for Computer Science & Engineering Technology, Vol. 4 No, ISSN: 2229-3345, 902-908, 07 Jul 2013.
  16. M. Thangarasu et. al. , and Dr. R. Manavalan, "Design and Development for Stemmer for Tamil Language: Cluster Analyses", International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 7, ISSN: 2277 128X, July 2013.
  17. Michael Steinbach, George Karypis, Vipin Kumar, "A Comparison of Document Clustering Techniques".
Index Terms

Computer Science
Information Sciences

Keywords

Tamil morphology Transliteration Tamil stemmer Improved affix stemmer Natural Language Processing