CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

Stemming Effectiveness in Clustering of Arabic Documents

by Osama A. Ghanem, Wesam M. Ashour
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 49 - Number 5
Year of Publication: 2012
Authors: Osama A. Ghanem, Wesam M. Ashour
10.5120/7620-0674

Osama A. Ghanem, Wesam M. Ashour . Stemming Effectiveness in Clustering of Arabic Documents. International Journal of Computer Applications. 49, 5 ( July 2012), 1-6. DOI=10.5120/7620-0674

@article{ 10.5120/7620-0674,
author = { Osama A. Ghanem, Wesam M. Ashour },
title = { Stemming Effectiveness in Clustering of Arabic Documents },
journal = { International Journal of Computer Applications },
issue_date = { July 2012 },
volume = { 49 },
number = { 5 },
month = { July },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume49/number5/7620-0674/ },
doi = { 10.5120/7620-0674 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:45:27.900568+05:30
%A Osama A. Ghanem
%A Wesam M. Ashour
%T Stemming Effectiveness in Clustering of Arabic Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 49
%N 5
%P 1-6
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is an important task gives good results with information retrieval (IR), it aims to automatically put similar documents in one cluster. Stemming is an important technique, used as feature selection to reduce many redundant features have the same root in root-based stemming and have the same syntacticalform in light stemming. Stemming has many advantages it reducesthe size of document and increases processing speed and used in many applications as information retrieval (IR). In this paper, we have evaluatedstemmingtechniques in clustering of Arabic language documents and determined the most efficient in pre-processing of Arabic language,whichis more complex than most other languages. Evaluation used three stemming techniques: root-based Stemming, light Stemming and without stemming. K-means, one of famous and widely clustering algorithm, is applied for clustering. Evaluation depends on recall, precision andF-measure methods. From experiments, results show that light stemming achieved best results in terms of recall, precision and F-measure when compared with others stemming.

References
  1. S. Ghwanmeh. "Applying Clustering of Hierarchical K-means-like Algorithm on Arabic Language", International Journal of Information and Communication Engineering 3:7 2007.
  2. Y. Fang, S. Parthasarathy, and F. Schwartz, "Using Clustering to Boost Text Classification", in Proc. of the IEEE International Conference on Data Mining, California, USA, 2001, pp. 123-127.
  3. M. Mahdavi,H. Abolhassani. "Harmony K-means algorithm for document clustering". Data Min Knowl Disc, 2008.
  4. A. A. B. Sembok T. , Abu Bakar Z. "A Rule and Template Based Stemming Algorithm for Arabic Language," INTERNATIONAL JOURNAL OF MATHEMATICAL MODELS AND METHODS IN APPLIED SCIENCES,Issue 5, Volume 5, pp. 974-981, 2011.
  5. IllhoiYoo, XiaohuaH. "Semantic Text Mining and its Application in Biomedical Domain. " , 2006
  6. N. Sandhya1, Y. Sri Lalitha2,V. Sowmya3, Dr. K. Anuradha4 and Dr. A. Govardhan5. " Analysis of Stemming Algorithm for Text Clustering", IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 5, No 1, September 2011
  7. Vivek Kumar Singh, NishaTiwari, ShekharGarg. "Document Clustering using K-means, Heuristic K-means and Fuzzy C-means", International Conference on Computational Intelligence and Communication Systems, 2011.
  8. A. A. Al-Harbi S. , Al-Thubaity A. , Khorsheed M. , Al-Rajeh A. "Automatic Arabic Text Classification" presented at the 9es Journéesinternationals, France, 2008.
  9. Sawaf H, Zaplo J. and Ney H. "Statistical Classification Methods for Arabic News Articles", Presented at the Arabic Natural Language Processing Workshop; 2001 July; Toulonse, France.
  10. Elkourdi M, Bensaid A, Rachidi T. "Automatic Arabic Document Categorization Based on the Naive Bayes Algorithm", Proceedings of COLING 20th Workshop on Computational Approaches to Arabic Script-based Languages; 2004 Aug; Geneva, Switzerland.
  11. Duwairi R. "A Distance-based Classifier for Arabic Text Categorization", International Conference on Data Mining (DMIN05); 2005 Jun; Las Vegas, USA.
  12. R. Duwairil, M. Al-Refai, N. Khasawneh . "Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization", 2007.
  13. A. E. S. Sawalha M. , "Comparative evaluation of Arabic language morphological analysers and stemmers". Presented at the Proceedings of COLING 2008 22nd International Conference on Comptational Linguistics, COLING 2008, 2008.
  14. Aljlayl, M. and Frieder, O. "On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach", ACM Eleventh Conference on Information and Knowledge Management; 2002 November 340-347; Mclean, VA, USA.
  15. G. C. Qiu Z. , Doherty A. R. , Smeaton, A. F. , "Term Weighting Approaches for Mining Significant Locations from Personal Location Logs," presented at the ProceedingsinCIT(2010), 2010.
  16. M. Shameem, R. Ferdous. "An efficient K-Means Algorithm integrated with Jaccard Distance Measure for Document Clustering", Internet, 2009. AH-ICI 2009. First Asian Himalayas International Conference on.
  17. Hall M, Frank E, Holmes B, "The WEKA data mining software: an update", ACM SIGKDD Explorations Newsletter, Vol 11, No. 1, pp. 10-18, 2009.
  18. PU HAN, DONG-BO WANG, QING-GUO ZHAO. "THE RESEARCH ON CHINESE DOCUMENT CLUSTERING BASED ONWEKA ". Proceedings of the 2011 International Conference on Machine Learning and Cybernetics, Guilin, 10-13 July, 2011.
  19. K. G. Al-Shalabi R. , Gharaibeh M. , "Arabic Text Categorization Using kNN Algorithm," presented at the Proceedings of the Int. multi conf. on computer science and information technology, 2006.
  20. Motaz K. Saad, "Open Source Arabic Language and Text Mining Tools", (2010, August), [Online]. Available: http://sourceforge. net/projects/ar-text-mining
  21. C. J. van Rijsbergen, Information Retrieval, 2nd ed. , Buttersworth, London, 1979.
Index Terms

Computer Science
Information Sciences

Keywords

Arabic text clustering Stemming light stemming K-means