Call for Paper - October 2014 Edition
IJCA solicits original research papers for the October 2014 Edition. Last date of manuscript submission is September 20, 2014. Read More

Automatic Arabic Text Clustering using K-means and K-mediods

Print
PDF
International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 51 - Number 2
Year of Publication: 2012
Authors:
Mahmud S. Alkoffash
10.5120/8012-0675

Mahmud S Alkoffash. Article: Automatic Arabic Text Clustering using K-means and K-mediods. International Journal of Computer Applications 51(2):5-8, August 2012. Full text available. BibTeX

@article{key:article,
	author = {Mahmud S. Alkoffash},
	title = {Article: Automatic Arabic Text Clustering using K-means and K-mediods},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {51},
	number = {2},
	pages = {5-8},
	month = {August},
	note = {Full text available}
}

Abstract

In this study we have implemented the Kmeans and Kmediods algorithms in order to make a practical comparison between them. The system was tested using a manual set of clusters that consists from 242 predefined clustering documents. The results showed a good indication about using them especially for Kmediods. The average precision and recall for Kmeans compared with Kmediods are 0. 56, 0. 52, 0. 69 and 0. 60 respectively. we have also extract feature set of keywords in order to improve the performance, the result illustrates that two algorithms can be applied to Arabic text, a sufficient number of examples for each category, the selection of the feature space, the training data set used and the value of K can enormously affect the accuracy of clustering.

References

  • Aljlayl, Mohammed, Frieder, Ophir," 2002. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach". CILM'02, November 4-9, M clean, Virginia, USA. ACM 1-58113-492-4/02/0011.
  • A. K. Jain, M. N. M urty,P. J. Flynn, 1999. Data Clustering : a review, ACM computing surveys(CSUR),v. 31n. 3,P. 264-323,sept.
  • D. Mladenic and M. Grobelnik,"Word Sequenc as Features in Text-Learning," 1999. In Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK-98),Ljubljana,Slovenia.
  • Ghosh , Joydeep, 2003. "Scalable Clustering",The Handbook of Data Mining, Nong Ye(Ed), Lawrence Erlbam Assoc. chapter 10,pp. 247-278.
  • G. Salton, 1989. "Automatic Text Processing: TheTransaction, Analysis, and Retrieval of Information by computer," Addison-Wesley.
  • Jiawei Han and Micheline Kamber, 2006, Data Mining: Concept and Techniques chapter 7, Depertment of Computer science ,University of Illinois at Urbana-chapaign: www. cs. uiuc. edu/~hanj
  • J. Bakus, M. F. Hussin, and M. Kamel,2002. "A SOM-Based Document Clustering using phrases,"In proceeding of the 9th International Conference on Naural Information processing (ICONIP'02),vol. 5,pp. 2212-2216.
  • K. M. Hammouda, Web Mining : Identifying Document Structure for Web Document Clusering, Master's Thesis, 2002. Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada.
  • Orengo et al Binoformatics-Genes,2003. Protein & Computers. BIOS,ISBN: 1-85996-054-5.
  • O. Zamir, 1999. Clustering Web Document: A phrase-Based Method for Group Search Engine Result, ph. D. dissertation, Dept. Computer Science & Engineering, Univ. of Washington.
  • P. Berkhin,2002. Survey of clustering data mining techniques. Technical report, Accrue soft ware. San Jose,CA.
  • Ricard Baeza-Yates and Berthier Ribeire-1999. Neto. Modern Onformation Retrieval ,January,.
  • Stephen K. Park and Keith W. Miller 1988. Random Number Generators:Good ones are hard to find communications of the ACM,31(10):1192-1201.
  • Xu,Jinxi, Fraser, Alexander, Weischedel, Ralph, 2002. "Empirical Studies in Strategies for Arabic Retrieval". SIGIR'02,August 11-15, , Tampere, Finland. ACM 1-58113-561-0/02/0008.