CFP last date
22 April 2024
Reseach Article

Automatic Arabic Text Clustering using K-means and K-mediods

by Mahmud S. Alkoffash
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 51 - Number 2
Year of Publication: 2012
Authors: Mahmud S. Alkoffash
10.5120/8012-0675

Mahmud S. Alkoffash . Automatic Arabic Text Clustering using K-means and K-mediods. International Journal of Computer Applications. 51, 2 ( August 2012), 5-8. DOI=10.5120/8012-0675

@article{ 10.5120/8012-0675,
author = { Mahmud S. Alkoffash },
title = { Automatic Arabic Text Clustering using K-means and K-mediods },
journal = { International Journal of Computer Applications },
issue_date = { August 2012 },
volume = { 51 },
number = { 2 },
month = { August },
year = { 2012 },
issn = { 0975-8887 },
pages = { 5-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume51/number2/8012-0675/ },
doi = { 10.5120/8012-0675 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:49:21.223918+05:30
%A Mahmud S. Alkoffash
%T Automatic Arabic Text Clustering using K-means and K-mediods
%J International Journal of Computer Applications
%@ 0975-8887
%V 51
%N 2
%P 5-8
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this study we have implemented the Kmeans and Kmediods algorithms in order to make a practical comparison between them. The system was tested using a manual set of clusters that consists from 242 predefined clustering documents. The results showed a good indication about using them especially for Kmediods. The average precision and recall for Kmeans compared with Kmediods are 0. 56, 0. 52, 0. 69 and 0. 60 respectively. we have also extract feature set of keywords in order to improve the performance, the result illustrates that two algorithms can be applied to Arabic text, a sufficient number of examples for each category, the selection of the feature space, the training data set used and the value of K can enormously affect the accuracy of clustering.

References
  1. Aljlayl, Mohammed, Frieder, Ophir," 2002. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach". CILM'02, November 4-9, M clean, Virginia, USA. ACM 1-58113-492-4/02/0011.
  2. A. K. Jain, M. N. M urty,P. J. Flynn, 1999. Data Clustering : a review, ACM computing surveys(CSUR),v. 31n. 3,P. 264-323,sept.
  3. D. Mladenic and M. Grobelnik,"Word Sequenc as Features in Text-Learning," 1999. In Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK-98),Ljubljana,Slovenia.
  4. Ghosh , Joydeep, 2003. "Scalable Clustering",The Handbook of Data Mining, Nong Ye(Ed), Lawrence Erlbam Assoc. chapter 10,pp. 247-278.
  5. G. Salton, 1989. "Automatic Text Processing: TheTransaction, Analysis, and Retrieval of Information by computer," Addison-Wesley.
  6. Jiawei Han and Micheline Kamber, 2006, Data Mining: Concept and Techniques chapter 7, Depertment of Computer science ,University of Illinois at Urbana-chapaign: www. cs. uiuc. edu/~hanj
  7. J. Bakus, M. F. Hussin, and M. Kamel,2002. "A SOM-Based Document Clustering using phrases,"In proceeding of the 9th International Conference on Naural Information processing (ICONIP'02),vol. 5,pp. 2212-2216.
  8. K. M. Hammouda, Web Mining : Identifying Document Structure for Web Document Clusering, Master's Thesis, 2002. Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada.
  9. Orengo et al Binoformatics-Genes,2003. Protein & Computers. BIOS,ISBN: 1-85996-054-5.
  10. O. Zamir, 1999. Clustering Web Document: A phrase-Based Method for Group Search Engine Result, ph. D. dissertation, Dept. Computer Science & Engineering, Univ. of Washington.
  11. P. Berkhin,2002. Survey of clustering data mining techniques. Technical report, Accrue soft ware. San Jose,CA.
  12. Ricard Baeza-Yates and Berthier Ribeire-1999. Neto. Modern Onformation Retrieval ,January,.
  13. Stephen K. Park and Keith W. Miller 1988. Random Number Generators:Good ones are hard to find communications of the ACM,31(10):1192-1201.
  14. Xu,Jinxi, Fraser, Alexander, Weischedel, Ralph, 2002. "Empirical Studies in Strategies for Arabic Retrieval". SIGIR'02,August 11-15, , Tampere, Finland. ACM 1-58113-561-0/02/0008.
Index Terms

Computer Science
Information Sciences

Keywords

Arabic Text Clustering Data mining Kmeans Kmediods