Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Adaptation of Cuckoo search for Documents Clustering

Print
PDF
International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 86 - Number 1
Year of Publication: 2014
Authors:
Walid Mohamed Aly
Hany Atef Kelleny
10.5120/14947-3041

Walid Mohamed Aly and Hany Atef Kelleny. Article: Adaptation of Cuckoo search for Documents Clustering. International Journal of Computer Applications 86(1):4-10, January 2014. Full text available. BibTeX

@article{key:article,
	author = {Walid Mohamed Aly and Hany Atef Kelleny},
	title = {Article: Adaptation of Cuckoo search for Documents Clustering},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {86},
	number = {1},
	pages = {4-10},
	month = {January},
	note = {Full text available}
}

Abstract

Automatic clustering of unstructured documents has become an essentially indispensible task, especially when dealing with the increasing electronic documents. Automatic clustering of documents involves document designation to a sub-group based on its content. K-means is one of the most popular unsupervised clustering algorithms, though the quality of its results relies heavily on the number of clusters chosen and the right selection of the initial cluster centroids. Cuckoo search is one of the most recent soft computing intelligent algorithms that can be chosen as an efficient search method in many optimization problems. In this paper, the original Cuckoo Search algorithm is adapted so that it can be applied efficiently to documents clustering problem. Our proposed modification enable Cuckoo search to use dynamic nests so that different values for the number of clusters can be explored, these nests are initialized with different corresponding forgy selection of initial centroids. During the implementation, these dynamic nests are updated using Lévy flight random walk and evaluated to detect the best nest. The proposed work is implemented and compared to the classical K-means clustering algorithm. The purity measure was used to evaluate the performance. Results show the efficiency of the proposed approach.

References

  • IBM, July 2012. IBM PowerLinux Big Data Analytics Solutions, USA, Available from: public. dhe. ibm. com [Accessed 24 December 2013].
  • IDC iView, June 2011. Extracting Value from Chaos, sponsored by EMC. The multimedia content can be viewed at http://www. emc. com/digital_universe [Accessed 20 December 2013].
  • Krishnamoorthi, M. and Dr. Natarajan A. M. , 2013. A Comparative Analysis of Enhanced Artificial Bee Colony Algorithms for Data Clustering. International Conference on Computer Communication and Informatics (ICCCI -2013), Coimbatore, INDIA 4-6 January 2013. IEEE, pp. 1-6.
  • Singh, V. K. , Tiwari, N. , and Garg, S. , 2011. Document Clustering using K-means, Heuristic K-means and Fuzzy C-means. International Conference on Computational Intelligence and Communication Systems, 2011. IEEE pp. 297-301.
  • Jensi, R. , and Wiselin, G. , December 2013. A Survey on Optimization Approaches to Text Document Clustering. IJCSA International Journal on Computational Sciences & Applications, Vol. 3, No. 6, pp. 31-44.
  • Rui Tang; Fong, S. ; Xin-She Yang; Deb, S. , "Integrating nature-inspired optimization algorithms to K-means clustering," 2012 Seventh International Conference on Digital Information Management (ICDIM), 22-24 August 2012. pp. 116, 123.
  • Hamerly, G. and Elkan, C. , 2002. Alternatives to the k-means algorithm that find better clusterings. Proceedings of the eleventh international conference on Information and knowledge management (CIKM).
  • Ahmed MD. E. and Bansal P. , 2013. Clustering Technique on Search Engine Dataset using Data Mining Tool. Third International Conference on Advanced Computing & Communication Technologies, Feb. 2013. IEEE, pp. 86-87
  • Agrawal, R. ; Phatak, M. , 2013. A novel algorithm for automatic document clustering. 3rd International Advanced Computing Conference (IACC), 22-23 Feb. 2013. IEEE, pp. 877, 882.
  • Goel, S. ; Sharma, A. ; Bedi, P. , 2011. Cuckoo Search Clustering Algorithm: A novel strategy of biomimicry. Information and Communication Technologies (WICT) 11-14 Dec. 2011. pp. 916, 921.
  • Xin-She Yang and S. Deb. Cuckoo search via levy flights. In Nature Biologically Inspired Computing, 2009. World Congress on, pages 210–214, 2009.
  • Aly. W. M. and Sheta, A. , 2013. Parameter estimation of nonlinear systems using levy flight cuckoo search. In Max Bramer and Miltos Petridis, editors, Research and Development in Intelligent Systems XXX, pages 443–449. Springer International Publishing, 2013
  • El-Shishtawy,T. , and El-Ghannam ,F. ,2012. An accurate arabic root-based lemmatizer for information retrieval purposes. IJCSI International Journal of Computer Science Issues, pp. 58-66.
  • Salton, G. , Wong, A. , and Yang, C. S. , 1974. A vector space model for automatic indexing. Cornell Univ. , Ithaca, NY, USA, Copyright 1975, IEE CU-CSD-74-218.
  • Zhao,W. , Wang ,Y. , and Li,D. ,2010. A dynamic feature selection method based on combination of ga with k-means. In Seconed International Conference on Industrial Mechatronics and Automation Wuhan, China 30-31 May 2010. pp. 271-274.
  • Al-Sulaiti,L. , Atwell,E. ,2006. The design of a corpus of contemporary arabic. International Journal of Corpus Linguistics, vol. 11, pp. 135-171.