CFP last date
22 April 2024
Reseach Article

Text Documents Clustering using Genetic Algorithm and Discrete Differential Evolution

by Yogesh Kumar Meena, Shashank, Vibhav Prakash Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 43 - Number 1
Year of Publication: 2012
Authors: Yogesh Kumar Meena, Shashank, Vibhav Prakash Singh
10.5120/6067-8221

Yogesh Kumar Meena, Shashank, Vibhav Prakash Singh . Text Documents Clustering using Genetic Algorithm and Discrete Differential Evolution. International Journal of Computer Applications. 43, 1 ( April 2012), 16-19. DOI=10.5120/6067-8221

@article{ 10.5120/6067-8221,
author = { Yogesh Kumar Meena, Shashank, Vibhav Prakash Singh },
title = { Text Documents Clustering using Genetic Algorithm and Discrete Differential Evolution },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 43 },
number = { 1 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 16-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume43/number1/6067-8221/ },
doi = { 10.5120/6067-8221 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:32:15.062675+05:30
%A Yogesh Kumar Meena
%A Shashank
%A Vibhav Prakash Singh
%T Text Documents Clustering using Genetic Algorithm and Discrete Differential Evolution
%J International Journal of Computer Applications
%@ 0975-8887
%V 43
%N 1
%P 16-19
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering in data mining is a discovery process that groups a set of documents such that documents within a cluster have high similarity while documents in different clusters have low similarity. Existing clustering method like K-means is a popular method but its results are based on choice of cluster centers so it easily results in local optimization. Genetic Algorithm (GA) is an optimization method which can be applied for finding out the best cluster centers easily. But sometimes it takes more iteration for finding best cluster centers. In this paper, we use features of GA with the features of Discrete Differential Evolution (DDE) to solve text documents clustering problem. To test the efficiency of our algorithm we have taken sample database of Reuters-21578. From the experimental results, it is clear that our algorithm performs better than GA and DDE.

References
  1. Wei Jian-Xiang, Liu Huai, Sun Yue-hong, Su Xin-Ning. "Application of Genetic Algorithm in Document Clustering", International Conference on Information Technology and Computer Science, Vol. 01, pp. 145-148, 2009.
  2. George Karypis, Eui-Hong(Sam)Han, Vipin Kumar. "CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling", IEEE Computer Society, Vol. 32, Issue. 8, pp. 68-75, 1999.
  3. A. Casillas, M. T. Gonz´alez de Lena, and R. Mart. "Document Clustering into an unknown number of clusters using a Genetic Algorithm", Lecture Notes in Computer Science, Vol. 2807/2003, pp. 43-49, 2003.
  4. Calinski & Harabasz. "A Dendrite Method for Cluster Analysis", Communications in Statistics, Vol. 3(1), pp. 1-27, 1974.
  5. K. Premalatha, A. M. Natarajan. "Genetic Algorithm for Documents Clustering with Simultaneous and Ranked Mutation", Modern Applied Science, Vol. 3, No. 2, 2009.
  6. Sheng ZHONG, Zhiwei LIN, Beihai ZHANG, Chengcheng YU. "Genetic Algorithm on Documents Clustering", Journal of Computational Information Systems, Vol. 3, pp. 1063-1068, 2008.
  7. Quan-Ke Pan, M. Fatih Tasgetiren, Yun-Chia Liang. "A Discrete Differential Evolution Algorithm for the permutation flowshop scheduling problem", GECCO, pp. 126-133, 2007.
  8. Chong Su, Qingcai Chen, Xiaolong Wang, Xianjun Meng. "Text Clustering Approach Based on Maximal Frequent Term Sets", IEEE International Conference, pp. 1551-1556, 2009.
  9. Jeffrey L. Solka, "Text Data Mining: Theory and Methods", Statistics Surveys, Vol. 2, (2008), pp. 94-112 (electronic), 2007.
  10. Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques", 2nd Edition, Elsevier, 2008.
  11. David E. Goldberg, "Genetic Algorithms in Search, Optimization, and Machine Learning", 1st edition, Pearson, 2008.
  12. http://kdd. ics. uci. edu/databases/reuters21578/reuters21578. html.
  13. ftp://ftp. cs. cornell. edu/pub/smart/ (accessed on April 13, 2006)
  14. C. Xiaohui, T. E. Potok, P. Palathingal Document Clustering using Particle Swarm Optimization, IEEE Swarm Intelligence Symposium,The Westin Pasadena, Pasadena, California, 2005.
  15. S. Das, A. Konar, U. K. Chakraborty, Two Improved Differential Evolution Schemes for Faster Global Search in ACM-SIGEVO Proceedings of Genetic and Evolutionary Computation Conference (GECCO-2005), Washington DC, June, 2005.
  16. K. Deb, A. Anand and D. Joshi (2002). A Computationally Efficient Evolutionary Algorithm for Real-Parameter Optimization, Evolutionary computation, 10(4), pp. 371 – 395.
  17. A. Ratnaweera, K, S. Halgamuge,: Self organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Trans. on Evolutionary Computation (2004) 8(3): 240-254.
  18. TREC. 1999. Text Retrieval Conference. http://trec. nist. gov ((accessed on April 13, 2006).
Index Terms

Computer Science
Information Sciences

Keywords

Genetic Algorithm Discrete Differential Evolution Document Clustering