CFP last date
20 May 2024
Reseach Article

Stemming Techniques and Naive Approach for Gujarati Stemmer

Published on February 2013 by Jikitsha R. Sheth, Bankim C. Patel
International Conference on Recent Trends in Information Technology and Computer Science 2012
Foundation of Computer Science USA
ICRTITCS2012 - Number 2
February 2013
Authors: Jikitsha R. Sheth, Bankim C. Patel
037d744c-fc92-43db-937a-218e0c6af2fb

Jikitsha R. Sheth, Bankim C. Patel . Stemming Techniques and Naive Approach for Gujarati Stemmer. International Conference on Recent Trends in Information Technology and Computer Science 2012. ICRTITCS2012, 2 (February 2013), 9-11.

@article{
author = { Jikitsha R. Sheth, Bankim C. Patel },
title = { Stemming Techniques and Naive Approach for Gujarati Stemmer },
journal = { International Conference on Recent Trends in Information Technology and Computer Science 2012 },
issue_date = { February 2013 },
volume = { ICRTITCS2012 },
number = { 2 },
month = { February },
year = { 2013 },
issn = 0975-8887,
pages = { 9-11 },
numpages = 3,
url = { /proceedings/icrtitcs2012/number2/10253-1334/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Recent Trends in Information Technology and Computer Science 2012
%A Jikitsha R. Sheth
%A Bankim C. Patel
%T Stemming Techniques and Naive Approach for Gujarati Stemmer
%J International Conference on Recent Trends in Information Technology and Computer Science 2012
%@ 0975-8887
%V ICRTITCS2012
%N 2
%P 9-11
%D 2013
%I International Journal of Computer Applications
Abstract

This paper focuses on stemming techniques used by various researchers for English language. Further the review of different approaches used for Hindi and Bengali languages are discussed. The authors have proposed a model for Gujarati stemmer by incorporating traditional as well as naïve approaches.

References
  1. Patel P. , Popat K. and Bhattacharyya P. 2010. "Hybrid Stemmer for Gujarati", Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), The 23rd International Conference on Computational Linguistics (COLING), pp. 51–55.
  2. Suba K. , Jiandani D. and Bhattacharyya P. 2011. "Hybrid Inflectional Stemmer and Rule-based Derivational Stemmer for Gujarati", Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), IJCNLP 2011, pp. 1–8.
  3. Larkey L. , Connell M. and Abduljaleel N. 2003. "Hindi CLIR in Thirty Days", ACM Transaction on Asian Language Information Processing, Vol-2(2), pp. 130-142.
  4. Ramanathan A. and Rao D. , 2003. "A Lightweight Stemmer for Hindi", Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 43-48.
  5. Shrivastava M. , Agrawal N. , Mohapatra B. , Singh S. and Bhattacharya P. 2005. "Morphology Based Natural Language Processing Tools for Indian Languages", Proceedings of the 4th Annual Inter Research Institute Student Seminar in Computer Science, IIT, Kanpur, India.
  6. Pandey A. and Siddiqui T. 2008. "An Unsupervised Hindi stemmer with heuristic improvements", Proceedings of the second workshop on Analytics for noisy unstructured text data (AND'08).
  7. Dolamic L. 2010. "Comparative Study of Indexing and Search Strategies for the Hindi, Marathi and Bengali Language", ACM Transactions on Asian Language Information Processing (TALIP), Vol. 9(3).
  8. Dasgupta S. and Khan M. 2004. "Feature unification for morphological parsing in Bangla", Proceedings of the 7th International Conference on Computer and Information Technology.
  9. Bhattacharya S. , Choudhury M. , Sarkar S. and Basu A. 2005. "Inflectional Morphology Synthesis for Bengali Noun, Pronoun and Verb Systems", Proceedings of the National Conference on Computer Processing of Bangla (NCCPB 05), pp. 34-43.
  10. Dasgupta S. and Ng V. 2007. "Unsupervised Morphological Parsing for Bengali", Language Resources and Evaluation, DOI 10. 1007/s10579-007-9031-y, Springer Science + Business Media B. V.
  11. Islam M. , Uddin M. and Khan M. 2007. "A Light Weight Stemmer for Bengali and Its Use in Spelling Checker", Proceedings of the 1st International Conference on Digital Communications and Computer Applications (DCCA2007).
  12. Majumder P. , Mitra M. , Parui S. and Kole G. 2007. "YASS: Yet another suffix stripper", ACM Transactions on Information Systems (TOIS).
  13. Das A. and Bandyopadhyay S. 2010. "Morphological Stemming Cluster Identification for Bangla", Knowledge Sharing Event-1: Task 3: Morphological Analyzers and Generators.
  14. Frakes W. and Baeza-Yates R. 1992. Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs, New Jersey, USA.
  15. Goldsmith J. 2001. "Unsupervised learning of the morphology of a natural language", Computational Linguistics, Vol. 27(2), pp. 153-198.
  16. Porter M. 1980. "An algorithm for suffix stripping". Program, Vol. 14 (3) pp. 130-137.
  17. Paice C. 1974. "Another Stemmer", ACM SIGIR forum, Vol. 25(3), pp 56-61.
  18. Bharati A. , Chaitanya V. , and Sangal R. 1995. Natural Language Processing: A Paninian Perspective. Prentice Hall of India, New Delhi, India.
  19. Hafer M. and Weiss S. 1974. "Word Segmentation by letter successor varieties", Information Storage and Retrieval, Vol. 10(1), pp. 371-385.
  20. Shrivastava M. and Bhattacharyya P. 2008. "Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge", Proceedings of International Conference on NLP (ICON08).
  21. Minnen G. , Carroll J. , and Pearce D. 2001. "Applied morphological processing of English", Natural Language Engineering, pp. 207-223, Cambridge University Press.
  22. Lovins J. 1968. "Development of a Stemming Algorithm," Mechanical Translation and Computational Linguistics, Vol. 11(1) and Vol. 11(2), March and June 1968.
  23. Jurafsky D. and Martin J. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. 2nd edition. Prentice-Hall, Englewood Cliffs, NJ.
  24. Massimo M. and Nicola O. . 2003. "A novel method for stemmer generation based on hidden Markov models". Proceedings of the twelfth international conference on Information and knowledge management. pp. 131-138.
  25. Alajmi A. , Saad E. , and Awadalla M. 2011. "Hidden markov model based Arabic morphological analyzer", International Journal of Computer Engineering Research Vol. 2(2), pp. 28-33.
Index Terms

Computer Science
Information Sciences

Keywords

Morphemes Stemmer Indian Languages Gujarati