CFP last date
22 April 2024
Reseach Article

Stop-Word Removal Algorithm and its Implementation for Sanskrit Language

by Jaideepsinh K. Raulji, Jatinderkumar R. Saini
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 150 - Number 2
Year of Publication: 2016
Authors: Jaideepsinh K. Raulji, Jatinderkumar R. Saini
10.5120/ijca2016911462

Jaideepsinh K. Raulji, Jatinderkumar R. Saini . Stop-Word Removal Algorithm and its Implementation for Sanskrit Language. International Journal of Computer Applications. 150, 2 ( Sep 2016), 15-17. DOI=10.5120/ijca2016911462

@article{ 10.5120/ijca2016911462,
author = { Jaideepsinh K. Raulji, Jatinderkumar R. Saini },
title = { Stop-Word Removal Algorithm and its Implementation for Sanskrit Language },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2016 },
volume = { 150 },
number = { 2 },
month = { Sep },
year = { 2016 },
issn = { 0975-8887 },
pages = { 15-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume150/number2/26065-2016911462/ },
doi = { 10.5120/ijca2016911462 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:54:50.216840+05:30
%A Jaideepsinh K. Raulji
%A Jatinderkumar R. Saini
%T Stop-Word Removal Algorithm and its Implementation for Sanskrit Language
%J International Journal of Computer Applications
%@ 0975-8887
%V 150
%N 2
%P 15-17
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In the Information era, optimization of processes for Information Retrieval, Text Summarization, Text and Data Analytic systems becomes utmost important. Therefore in order to achieve accuracy, extraction of redundant words with low or no semantic meaning must be filtered out. Such words are known as stopwords. Stopwords list has been developed for languages like English, Chinese, Arabic, Hindi, etc. Stopword list is also available for Sanskrit language. Stop-word removal is an important preprocessing techniques used in Natural Language processing applications so as to improve the performance of the Information Retrieval System, Text Analytics & Processing System, Text Summarization, Question-Answering system, stemming etc. In this paper, a simple approach is used to design stop-word removal algorithm and its implementation for Sanskrit language. The algorithm and its implementation uses dictionary based approach. In dictionary based approach predefined list of stopwords is compared to the target text on which removal is required.

References
  1. Siddiqui T. and Tiwary U.S., “Natural Language Processing and Information Retrieval”, Oxford University press, 2008.
  2. Raulji J. K. and Saini J. R., “A Generic Stopword list for Sanskrit Language”, Submitted for publication.
  3. Ibrahim A, “Effects of Stop Words Elimination for Arabic Information Retrieval : A comparative study”, International Journal of Computing and Information Sciences, Vol. 4 No. 3, Dec 2006.
  4. Ashish T, Kothari M and Pinkesh P, “Pre-Processing Phase of Text Summarization Based on Gujarati Language”, International Journal of Innovative Research in Computer Science & Technology (IJIRCST) Vol-2, Iss-4, July 2014.
  5. Riyad A, Ghassan K, Jihad J, Ahmad H and Eyad H, “Stop-Word Removal Algorithm for Arabic Language”, Information and Communication Technologies: From theory to Applications, 2004 proceedings, IEEE 2004.
  6. Basim A and Mohammad A, “Hybrid Stop-Word Removal Technique for Arabic Language”, Egyptian Computer Science Journal, Vol-30 No-1, Jan 2008.
  7. Vijayarani S, Ilamathi J and Nithya, “Preprocessing Techniques for Text Mining - An Overview”, International Journal of Computer Science & Communication Networks, Vol 5(1),7-16.
  8. Rakholia R. M. and Saini J. R., “A Rule-based Approach to Identify Stop Words for Gujarati Language”, accepted for publication in Advances in Intelligent and Soft Computing (AISC) Series, ISSN: 1615-3871, 2194-5357, 1860-0794 by Springer-Verlag, Germany.
  9. Rakholia R. M. and Saini J. R. “Information Retrieval for Gujarati Language using Cosine Similarity based Vector Space Model” , accepted for publication in Advances in Intelligent and Soft Computing (AISC) Series, ISSN: 1615-3871, 2194-5357, 1860-0794 by Springer-Verlag, Germany.
  10. Kaur J. and Saini J. R., “A Natural Language Processing Approach for Identification of Stop Words in Punjabi Language”, published in International Journal of Data Mining and Emerging Technologies; ISSN: 2249-3212 (eISSN: 2249-3220); Indian Journals, New Delhi, India; vol. 5, issue 2, November 2015; pages 114-120; DOI : 10.5958/2249-3220.2015.00015.4
  11. Kaur J. and Saini J. R., “POS Word Class based Categorization of Gurmukhi Language Stemmed Stop Words”, published in the proceedings of 1st International Conference on Information and Communication Technology for Intelligent Systems (ICTIS-2015), ISSN: 2190-3018, eISSN: 2190-3026; Springer International Publishing, Switzerland; Smart Innovation, Systems and Technologies (SIST) Series (8767), vol. 51, edition 1, pages 3-10; DOI: 10.1007/978-3-319-30927-9_1; Available Online: http://link.springer.
  12. Kaur J. and Saini J. R., “Punjabi Stop Words: A Gurmukhi, Shahmukhi and Roman Scripted Chronicle”, accepted and to be published in the proceedings of National Symposium: ACM Women in Research 2016, ACM-WIR-2016, Indore, published by ACM’s International Conference Proceedings Series (ICPS), ISBN: 978-1-4503-4278-0.
  13. Saini J. R. and Rakholia R. M., “On Continent and Script-wise Divisions-based Statistical Measures for Stop-words Lists of International Languages”, accepted and to be published in the proceedings of ICIP-2016: The Society of Information Processing’s Twelfth International Multi Conference on Information Processing’s International Conference on Data Mining and Warehousing (ICDMW-2016), Bangalore; published by Procedia Computer Science, the International Journal, ISSN: 1877-0509, Elsevier, Netherl
  14. N. Murali, R. J. Ramasree and K.V.R.K. Acharyulu, “Avyaya Analyzer : Analysis of Indeclinables using Finite State Transducers”, International Journal of Computer Applications (0975-8887) Vol – 38, No-6, January 2012.
  15. “Sanskrit Bhagvad Gita”, Available on http://sanskritdocuments.org
  16. “Panchtantra Stories”, Available on http://sanskrit.samskrutam.com/en.literature-stories.ashx
  17. “Brahmakand, Vakyakand, Padakand”, Available on http://sanskrit.jnu.ac.in
  18. “Sanskrit Essays” Available on http://sanskrit-essays.blogspot.in
Index Terms

Computer Science
Information Sciences

Keywords

Information Retrieval (IR) Natural Language Processing (NLP) Sanskrit Stopword Tokenization.