CFP last date
22 April 2024
Reseach Article

Article:Design and Development of a Stemmer for Punjabi

by Dinesh Kumar, Prince Rana
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 11 - Number 12
Year of Publication: 2010
Authors: Dinesh Kumar, Prince Rana
10.5120/1634-2196

Dinesh Kumar, Prince Rana . Article:Design and Development of a Stemmer for Punjabi. International Journal of Computer Applications. 11, 12 ( December 2010), 18-23. DOI=10.5120/1634-2196

@article{ 10.5120/1634-2196,
author = { Dinesh Kumar, Prince Rana },
title = { Article:Design and Development of a Stemmer for Punjabi },
journal = { International Journal of Computer Applications },
issue_date = { December 2010 },
volume = { 11 },
number = { 12 },
month = { December },
year = { 2010 },
issn = { 0975-8887 },
pages = { 18-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume11/number12/1634-2196/ },
doi = { 10.5120/1634-2196 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:00:24.250425+05:30
%A Dinesh Kumar
%A Prince Rana
%T Article:Design and Development of a Stemmer for Punjabi
%J International Journal of Computer Applications
%@ 0975-8887
%V 11
%N 12
%P 18-23
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Stemming is the process of removing the affixes from inflected words, without doing complete morphological analysis. A stemming Algorithm is a procedure to reduce all words with the same stem to a common form [20]. It is useful in many areas of computational linguistics and information-retrieval work. This technique is used by the various search engines to find the best solution for a problem. The algorithm is a basic building block for the stemmer. Stemmer is basically used in information retrieval system to improve the performance .The paper present a stemmer for Punjabi, which uses a brute force algorithm. We also use a suffix stripping technique in our paper. Similar techniques can be used to make stemmer for other languages such as Hindi, Bengali and Marathi. The result of stemmer is good and it can be effective in information retrieval system. This stemmer also reduces the problem of over-stemming and under-stemming.

References
  1. Julie Beth Lovins, (1968)”Development of a Stemming Algorithm*”Mechanical Translation and Computational Linguistics, Vol No.11, Issue No.1, pp 22-31.
  2. M.F Porter (1980)” An algorithm for suffix stripping” Published in Program, Vol No.14, Issue No.3, pp 130-137,URL:http://www.cs.odu.edu/~jbollen/IR04/reading s/readings5.pdf.
  3. David A Hull Gregory Grefenstette (1996)” A Detailed Analysis of English Stemming Algorithms “Rank Xerox research Centre 6 chemin day mauperutis, 38240 Melyanfrance,pp 1-16,
  4. Jörg Caumanns (1998) “A Fast and Simple Stemming Algorithm for German Words1”Algorithm is Publish in Department of computer science at the free university of Berlin, pp 1-10,
  5. Tanja Gaustad and Goose Bauma (2000)”Accurate Stemming of Dutch for Text Classification” Language Computing. Vol No.45, Issue No. 1, pp 104-117.
  6. Bal Krishna Bal, Prajol Shrestha (2004)”A Morphological Analyzer and a Stemmer for Nepali” PAN Localization, Working Papers 2004-2007, pp 324-31.
  7. Md. Zahurul Islam, Md. Nizam Uddin and Mumit Khan (2004)” A Light Weight Stemmer for Bengali and Its Use in Spelling Checker” Proceedings of 1st International Conference on Digital Communications and Computer Applications (DCCA2007), Irbid, Jordan, pp 87-93.
  8. Haidar Harmani, Walid Keirouz, & Saeed Raheel (2006) “A rule base extensible stemmer for Information retrieval with application to Arabic” The international Arab journal of information technology, Vol No.3, Issue No.3, pp 265-272.
  9. Ababneh Mohammad , Oqeili Saleh and Rawan A. Abdeen(2006)” Occurrences Algorithm for String Searching Based on Brute-force Algorithm” Jordan Journal of Computer Science Vol No.2,Issue No.1,pp 82-85 .
  10. Jiaul H. Paik and Swapan K. Parui (2008) “A Simple Stemmer for Inflectional Languages” Journal of Documentation, Vol No.61 Issue No.4, pp. 476 – 496.
  11. Muhamad Taufik Abdullah†, Fatimah Ahmad†, Ramlan Mahmod† and Tengku Mohd Tengku Sembok (2009) “Rules Frequency Order Stemmer for Malay Language” IJCSNS International Journal of Computer Science and Network Security, Vol No.9, Issue No.2, pp 433-438.
  12. Miguel E. Ruiz and Bharath Dandala (2010) “Evaluating Stemmers and Retrieval Fusion Approaches for Hindi: UNT at FIRE 2010” URL:http:// www.isical.ac.in/ ~fire/paper_2010 /MiguelRuiz-unt-fire-2010.pdf
  13. Ananthakrishnan Ramanathan and Durgesh D Rao “A Lightweight Stemmer for Hindi” In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), on Computatinal Linguistics for South Asian Languages (Budapest, Apr.) Workshop,pp 42-48
  14. Abduelbaset M. GOoweder, Husien A. Alhammi , Tarik Rashed,and Abdulsalam Musrati “ A Hybrid Method for Stemming Arabic Text ” Journal of computer Science,URL: http://eref.uqu.edu.sa/files/eref2/folder6/f181.pdf.
  15. Samir Abdou and Patrick Ruck and Jacques Savoy (2005) “ Evaluation of Stemming, Query Expansion and Manual Indexing Approaches for the Genomic Task “NIST Special Publication :SP 500-266 The Fourteenth Text Retrieval Conference(TREC 2005) Proceedings ,URL: http://trec.nist.gov/pubs/trec14/papers/uneuchatel.geo.pdf.
  16. James Mayfield and Paul McNamee “Single N-gram Stemming” SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28 - August 1, 2003, Toronto, Canada,pp 415-416 .
  17. Marie-Claire Jenkins, Dan Smith, “Conservative stemming for search and indexing” School of Computing Sciences.University Of East-Anglia Norwich NR4 7TJUK,URL:www.uea.ac.uk/polopoly_fs/1.85493!stemmer25feb.pdf.
  18. Hayder K. Al Ameed, Shaikha O. Al Ketbi, Amna A. Al Kaabi, Khadija S. Al Shebli,Naila F. Al Shamsi, Noura H. Al Nuaimi, Shaikha S. Al Muhairi “Arabic Light Stemmer: A New Enhanced Approach” The Second International Conference on Innovations in Information Technology (IIT’05) URL:http://www.onlinelibrary.wiley.com/doi/10. 1002/asi.21247/pdf.
  19. Ghassan Kanan,Mohammad Ababney,Riyad Al Shalabi,Alla Nal Nobani”Building an effective rule based light Stemmer Arabic language to improve search effectiveness “Arab Academy for banking and financial science,Al balka Applied university,pp. 312-316,URL:www.ccis2K.org/iajit/PDF/vol.3, no.3no.3/12-Haidar.pdf,
  20. www.wikipedia.com.
  21. http://www.comp.lancs.ac.uk.
Index Terms

Computer Science
Information Sciences

Keywords

Stemmer Stemming Brute Force Algorithm Suffix Striping Under-stemming Over-stemming Stemming Algorithm