Automatic Arabic Text Clustering using K-means and K-mediods

Mahmud S. Alkoffash

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Automatic Arabic Text Clustering using K-means and K-mediods

by Mahmud S. Alkoffash

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 51 - Number 2

Year of Publication: 2012

Authors: Mahmud S. Alkoffash

10.5120/8012-0675

Mahmud S. Alkoffash . Automatic Arabic Text Clustering using K-means and K-mediods. International Journal of Computer Applications. 51, 2 ( August 2012), 5-8. DOI=10.5120/8012-0675

@article{ 10.5120/8012-0675,

author = { Mahmud S. Alkoffash },

title = { Automatic Arabic Text Clustering using K-means and K-mediods },

journal = { International Journal of Computer Applications },

issue_date = { August 2012 },

volume = { 51 },

number = { 2 },

month = { August },

year = { 2012 },

issn = { 0975-8887 },

pages = { 5-8 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume51/number2/8012-0675/ },

doi = { 10.5120/8012-0675 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:49:21.223918+05:30

%A Mahmud S. Alkoffash

%T Automatic Arabic Text Clustering using K-means and K-mediods

%J International Journal of Computer Applications

%@ 0975-8887

%V 51

%N 2

%P 5-8

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In this study we have implemented the Kmeans and Kmediods algorithms in order to make a practical comparison between them. The system was tested using a manual set of clusters that consists from 242 predefined clustering documents. The results showed a good indication about using them especially for Kmediods. The average precision and recall for Kmeans compared with Kmediods are 0. 56, 0. 52, 0. 69 and 0. 60 respectively. we have also extract feature set of keywords in order to improve the performance, the result illustrates that two algorithms can be applied to Arabic text, a sufficient number of examples for each category, the selection of the feature space, the training data set used and the value of K can enormously affect the accuracy of clustering.

References

Aljlayl, Mohammed, Frieder, Ophir," 2002. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach". CILM'02, November 4-9, M clean, Virginia, USA. ACM 1-58113-492-4/02/0011.
A. K. Jain, M. N. M urty,P. J. Flynn, 1999. Data Clustering : a review, ACM computing surveys(CSUR),v. 31n. 3,P. 264-323,sept.
D. Mladenic and M. Grobelnik,"Word Sequenc as Features in Text-Learning," 1999. In Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK-98),Ljubljana,Slovenia.
Ghosh , Joydeep, 2003. "Scalable Clustering",The Handbook of Data Mining, Nong Ye(Ed), Lawrence Erlbam Assoc. chapter 10,pp. 247-278.
G. Salton, 1989. "Automatic Text Processing: TheTransaction, Analysis, and Retrieval of Information by computer," Addison-Wesley.
Jiawei Han and Micheline Kamber, 2006, Data Mining: Concept and Techniques chapter 7, Depertment of Computer science ,University of Illinois at Urbana-chapaign: www. cs. uiuc. edu/~hanj
J. Bakus, M. F. Hussin, and M. Kamel,2002. "A SOM-Based Document Clustering using phrases,"In proceeding of the 9th International Conference on Naural Information processing (ICONIP'02),vol. 5,pp. 2212-2216.
K. M. Hammouda, Web Mining : Identifying Document Structure for Web Document Clusering, Master's Thesis, 2002. Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada.
Orengo et al Binoformatics-Genes,2003. Protein & Computers. BIOS,ISBN: 1-85996-054-5.
O. Zamir, 1999. Clustering Web Document: A phrase-Based Method for Group Search Engine Result, ph. D. dissertation, Dept. Computer Science & Engineering, Univ. of Washington.
P. Berkhin,2002. Survey of clustering data mining techniques. Technical report, Accrue soft ware. San Jose,CA.
Ricard Baeza-Yates and Berthier Ribeire-1999. Neto. Modern Onformation Retrieval ,January,.
Stephen K. Park and Keith W. Miller 1988. Random Number Generators:Good ones are hard to find communications of the ACM,31(10):1192-1201.
Xu,Jinxi, Fraser, Alexander, Weischedel, Ralph, 2002. "Empirical Studies in Strategies for Arabic Retrieval". SIGIR'02,August 11-15, , Tampere, Finland. ACM 1-58113-561-0/02/0008.

Index Terms

Computer Science

Information Sciences

Keywords

Arabic Text Clustering Data mining Kmeans Kmediods