Call for Paper - December 2019 Edition
IJCA solicits original research papers for the December 2019 Edition. Last date of manuscript submission is November 20, 2019. Read More

Design and Development of a Stemmer for Punjabi

Print
PDF
International Journal of Computer Applications
© 2010 by IJCA Journal
Number 12 - Article 5
Year of Publication: 2010
Authors:
Dinesh Kumar
Prince Rana
10.5120/1634-2196

Dinesh Kumar and Prince Rana. Article:Design and Development of a Stemmer for Punjabi. International Journal of Computer Applications 11(12):18–23, December 2010. Published By Foundation of Computer Science. BibTeX

@article{key:article,
	author = {Dinesh Kumar and Prince Rana},
	title = {Article:Design and Development of a Stemmer for Punjabi},
	journal = {International Journal of Computer Applications},
	year = {2010},
	volume = {11},
	number = {12},
	pages = {18--23},
	month = {December},
	note = {Published By Foundation of Computer Science}
}

Abstract

Stemming is the process of removing the affixes from inflected words, without doing complete morphological analysis. A stemming Algorithm is a procedure to reduce all words with the same stem to a common form [20]. It is useful in many areas of computational linguistics and information-retrieval work. This technique is used by the various search engines to find the best solution for a problem. The algorithm is a basic building block for the stemmer. Stemmer is basically used in information retrieval system to improve the performance .The paper present a stemmer for Punjabi, which uses a brute force algorithm. We also use a suffix stripping technique in our paper. Similar techniques can be used to make stemmer for other languages such as Hindi, Bengali and Marathi. The result of stemmer is good and it can be effective in information retrieval system. This stemmer also reduces the problem of over-stemming and under-stemming.

Reference

  • Julie Beth Lovins, (1968)”Development of a Stemming Algorithm*”Mechanical Translation and Computational Linguistics, Vol No.11, Issue No.1, pp 22-31.
  • M.F Porter (1980)” An algorithm for suffix stripping” Published in Program, Vol No.14, Issue No.3, pp 130-137,URL:http://www.cs.odu.edu/~jbollen/IR04/reading s/readings5.pdf.
  • David A Hull Gregory Grefenstette (1996)” A Detailed Analysis of English Stemming Algorithms “Rank Xerox research Centre 6 chemin day mauperutis, 38240 Melyanfrance,pp 1-16,
  • Jörg Caumanns (1998) “A Fast and Simple Stemming Algorithm for German Words1”Algorithm is Publish in Department of computer science at the free university of Berlin, pp 1-10,
  • Tanja Gaustad and Goose Bauma (2000)”Accurate Stemming of Dutch for Text Classification” Language Computing. Vol No.45, Issue No. 1, pp 104-117.
  • Bal Krishna Bal, Prajol Shrestha (2004)”A Morphological Analyzer and a Stemmer for Nepali” PAN Localization, Working Papers 2004-2007, pp 324-31.
  • Md. Zahurul Islam, Md. Nizam Uddin and Mumit Khan (2004)” A Light Weight Stemmer for Bengali and Its Use in Spelling Checker” Proceedings of 1st International Conference on Digital Communications and Computer Applications (DCCA2007), Irbid, Jordan, pp 87-93.
  • Haidar Harmani, Walid Keirouz, & Saeed Raheel (2006) “A rule base extensible stemmer for Information retrieval with application to Arabic” The international Arab journal of information technology, Vol No.3, Issue No.3, pp 265-272.
  • Ababneh Mohammad , Oqeili Saleh and Rawan A. Abdeen(2006)” Occurrences Algorithm for String Searching Based on Brute-force Algorithm” Jordan Journal of Computer Science Vol No.2,Issue No.1,pp 82-85 .
  • Jiaul H. Paik and Swapan K. Parui (2008) “A Simple Stemmer for Inflectional Languages” Journal of Documentation, Vol No.61 Issue No.4, pp. 476 – 496.
  • Muhamad Taufik Abdullah†, Fatimah Ahmad†, Ramlan Mahmod† and Tengku Mohd Tengku Sembok (2009) “Rules Frequency Order Stemmer for Malay Language” IJCSNS International Journal of Computer Science and Network Security, Vol No.9, Issue No.2, pp 433-438.
  • Miguel E. Ruiz and Bharath Dandala (2010) “Evaluating Stemmers and Retrieval Fusion Approaches for Hindi: UNT at FIRE 2010” URL:http:// www.isical.ac.in/ ~fire/paper_2010 /MiguelRuiz-unt-fire-2010.pdf
  • Ananthakrishnan Ramanathan and Durgesh D Rao “A Lightweight Stemmer for Hindi” In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), on Computatinal Linguistics for South Asian Languages (Budapest, Apr.) Workshop,pp 42-48
  • Abduelbaset M. GOoweder, Husien A. Alhammi , Tarik Rashed,and Abdulsalam Musrati “ A Hybrid Method for Stemming Arabic Text ” Journal of computer Science,URL: http://eref.uqu.edu.sa/files/eref2/folder6/f181.pdf.
  • Samir Abdou and Patrick Ruck and Jacques Savoy (2005) “ Evaluation of Stemming, Query Expansion and Manual Indexing Approaches for the Genomic Task “NIST Special Publication :SP 500-266 The Fourteenth Text Retrieval Conference(TREC 2005) Proceedings ,URL: http://trec.nist.gov/pubs/trec14/papers/uneuchatel.geo.pdf.
  • James Mayfield and Paul McNamee “Single N-gram Stemming” SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28 - August 1, 2003, Toronto, Canada,pp 415-416 .
  • Marie-Claire Jenkins, Dan Smith, “Conservative stemming for search and indexing” School of Computing Sciences.University Of East-Anglia Norwich NR4 7TJUK,URL:www.uea.ac.uk/polopoly_fs/1.85493!stemmer25feb.pdf.
  • Hayder K. Al Ameed, Shaikha O. Al Ketbi, Amna A. Al Kaabi, Khadija S. Al Shebli,Naila F. Al Shamsi, Noura H. Al Nuaimi, Shaikha S. Al Muhairi “Arabic Light Stemmer: A New Enhanced Approach” The Second International Conference on Innovations in Information Technology (IIT’05) URL:http://www.onlinelibrary.wiley.com/doi/10. 1002/asi.21247/pdf.
  • Ghassan Kanan,Mohammad Ababney,Riyad Al Shalabi,Alla Nal Nobani”Building an effective rule based light Stemmer Arabic language to improve search effectiveness “Arab Academy for banking and financial science,Al balka Applied university,pp. 312-316,URL:www.ccis2K.org/iajit/PDF/vol.3, no.3no.3/12-Haidar.pdf,
  • www.wikipedia.com.
  • http://www.comp.lancs.ac.uk.