CFP last date
22 April 2024
Reseach Article

A Study of Bio-inspired Algorithm to Data Clustering using Different Distance Measures

by O.A.Mohamed Jafar, R. Sivakumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 66 - Number 12
Year of Publication: 2013
Authors: O.A.Mohamed Jafar, R. Sivakumar
10.5120/11137-6216

O.A.Mohamed Jafar, R. Sivakumar . A Study of Bio-inspired Algorithm to Data Clustering using Different Distance Measures. International Journal of Computer Applications. 66, 12 ( March 2013), 33-44. DOI=10.5120/11137-6216

@article{ 10.5120/11137-6216,
author = { O.A.Mohamed Jafar, R. Sivakumar },
title = { A Study of Bio-inspired Algorithm to Data Clustering using Different Distance Measures },
journal = { International Journal of Computer Applications },
issue_date = { March 2013 },
volume = { 66 },
number = { 12 },
month = { March },
year = { 2013 },
issn = { 0975-8887 },
pages = { 33-44 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume66/number12/11137-6216/ },
doi = { 10.5120/11137-6216 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:22:13.100435+05:30
%A O.A.Mohamed Jafar
%A R. Sivakumar
%T A Study of Bio-inspired Algorithm to Data Clustering using Different Distance Measures
%J International Journal of Computer Applications
%@ 0975-8887
%V 66
%N 12
%P 33-44
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data mining is the process of extracting previously unknown and valid information from large databases. Clustering is an important data analysis and data mining method. It is the unsupervised classification of objects into clusters such that the objects from same cluster are similar and objects from different clusters are dissimilar. Data clustering is a difficult unsupervised learning problem because many factors such as distance measures, criterion functions, and initial conditions have come into play. Many algorithms have been proposed in literature. However, some traditional algorithms have drawbacks such as sensitive to initialization and easily trapped in local optima. Recently, bio-inspired algorithms such as ant colony algorithms (ACO) and particle swarm optimization algorithms (PSO) have found success in solving clustering problems. These algorithms have also been used in several other real-life applications. They are global optimization techniques. The distance based algorithms have been studied for the clustering problems. This paper provides a study of particle swarm optimization algorithm to data clustering using different distance measures including Euclidean, Manhattan and Chebyshev for well known real-life benchmark medical data sets and an artificially generated data set. The PSO-based clustering algorithm using Chebyshev distance measure is better fitness value than those of Euclidean and Manhattan distance measures.

References
  1. Han, J. , and Kamber. 2001. "Data mining: concepts and techniques", Morgan Kaufmann, San Francisco.
  2. MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. In 5th Berkeley symposium on mathematics, statistics and probability, pp. 281-296.
  3. Kaufman, L. , and Russeeuw, P. 1990. "Finding groups in data: an introduction to cluster analysis", New York: John Wiley & Sons.
  4. Zhang, T. , Raakrishanan, R. , and Livny, M. 1996. "BIRCH: an efficient data clustering method for very large databases", In Proceedings ACM SIGMOD international conference on the management of data, pp. 103-114.
  5. Ester, M. , Kriegel, H-P. , Sander, J. , and Xu X. 1996. "A density based algorithm for discovering clusters in large spatial databases with noise", In Simuoudis, E. , Han, J. , & Fayyard, U. editors, second international conference on knowledge discovery and data mining, pp. 226-231, AAAI press, Portland.
  6. Guha, S. , Rastogi, R. , and Shim, K. 1998. "CURE: an efficient clustering algorithm for large databases", In Proceedings ACM SIGMOD international conference on the management of data, pp. 73-84, Seatle, USA.
  7. Karypis, G. , Han, E-H. , and Kumar, V. 1999. "CHAMELEON: a hierarchical clustering algorithm using dynamic modeling", Computer, 32, pp. 32-68.
  8. Ganti, V. , Gehrke, J. , and Ramakrishnan, R. 1999. "CACTUS – clustering categorical data using summaries", In International conference on knowledge discovery and data mining, pp. 73-83, San Diego, USA.
  9. Ng, R. , and Han, J. 2002. "CLARANS: a method for clustering objects for spatial data mining", IEEE Trans Knowl Data Eng, 14(5), pp. 1003-1016.
  10. Gungor, Z. , and Unler, A. 2007. "K-harmonic means data clustering with simulated annealing heuristic", Applied mathematics and computation, 184(2), pp. 199-209.
  11. Bin, W. , and Zhongzhi, S. 2001. "A clustering algorithm based on swarm intelligence", In Proceedings of the international conference on Info-tech and Info-net, Beijing, China, pp. 58-66.
  12. Jain, A. , Murty, M. , and Flynn, P. (1999). Data clustering: a review. ACM Computing Surveys, 31(3), 264-323.
  13. Jain, A. , and Dubes, R. 1998. "Algorithms for clustering data", Prentice Hall, New Jersey.
  14. Berkhin, P. 2002. "Survey clustering data mining techniques", Technical report, Accrue software, San Jose, California.
  15. Xu, R. , and Wunsch II, D. 2005. "Survey of clustering algorithms", IEEE Transactions on Neural Networks, 16(3), 645-678.
  16. Ding, C. , and He, X. 2002. "Cluster merging and splitting in hierarchical clustering algorithms", IEEE international conference, pp. 139-146.
  17. Yongguo Liu, Jun Peng, Kefei Chen, and Yi Zhang. 2006. "An improved hybrid genetic clustering algorithm", SETN 2006, LNAI 3955, pp. 192-202.
  18. Bonabeau, E. , Dorigo, M. , and Theraulaz, G. 1999. "Swarm intelligence: from natural to artificial systems", Oxford university press, Inc. , New York.
  19. Dorigo, M. , and Stutzle, T. 2004. "Ant colony optimization", MIT press, Cambridge, Massachusetts, London, England.
  20. de Castro, L. N. , and Timmis, J. 2002. "Artificial Immune Systems: a new computational intelligence approach", Springer, Heidelberg.
  21. Zhang, C. , Quyang, D. , and Ning, J. 2010. "An artificial bee colony approach for clustering", Expert systems and applications, 37, pp. 4761-4767.
  22. Paterlini, S. , and Minerva, T. 2003. "Evolutionary approaches for cluster analysis", In Bonarini. A. , Musulli, F. , Pasi, G. , (Eds. ) Soft computing applications, Springer-Verlag, Berlin, pp. 167-178.
  23. Goldberg, D. E. 1975. "Genetic algorithms in search, optimization and machine learning", Addison-Wesley, Reading, MA.
  24. Falkenauer, E. 1998. "Genetic algorithms and grouping problems", John Wiley and Sons, Chichester.
  25. Kennedy, J. , and Eberhart, R. C. 1995. "Particle swarm optimization", In Proceedings of the IEEE international joint conference on neural networks, IJCNN 95, Piscataway, IEEE press, pp. 1942-1948.
  26. Al-Sultan, K. S. 1995. "A tabu search approach to the clustering problem", Pattern recognition, 28, pp. 1443-1451.
  27. Gendreau, M. 2003. "An introduction to tabu search", In Handbook of metaheuristics, Kochenberger, G. , Glover, F. , (Eds. ), Dordrecht, Kluwer Academic Publishers.
  28. Sousa, T. , Neves, A. , and Silva, A. 2003. "Swarm optimization as a new tool for data mining",. In Proceedings of the 17th international symposium on parallel and distributed processing (IPDPS'03), pp. 48-53.
  29. Van der Merwe, D. , and Engelbrecht, A. 2003. "Data clustering using particle swarm optimization", In Proceedings of IEEE congress on evolutionary computation (CEC 2003), Canbella, Australia, pp. 215-220.
  30. Liping Yan and Jianchao Zeng 2006. "Using particle swarm optimization and genetic programming to evolve classification rules", In Sixth world congress on intelligent control and automation (WCICA 2006), pp. 3415-3419.
  31. Apostolopoulos, T. , and Vlachos, A. 2011. "Application of the firefly algorithm for solving the economic emissions load dispatch problem", International Journal of Combinatorics, 2011, pp. 1-23.
  32. Mustafa Servet Kiran, Hazim Iscan and Mesut Gunduz 2012. "The analysis of discrete artificial bee colony algorithm with neighborhood operator on travelling salesman problem", Neural computing and applications.
  33. Poli, R. , Kennedy, J. , and Blackwell, T. 2007. "Particle swarm optimization – an overview", Swarm intelligence, 1(1), pp. 33-57.
  34. Shi, Y. , and Eberhart, R. C. 1998. "A modified particle swarm optimizer", In Proceedings of the IEEE congress on evolutionary computation (CEC 1998), Piscataway, NJ, pp. 69-73.
  35. Omran, M. , Salman, A. , and Engelbrecht, A. 2002. "Image classification using particle swarm optimization", In Wang L, Tan KC, Furukhashi T, Kim J-H, Yao X (Eds. ), Proceedings of the fourth Asia-pacific conference on simulated evolution and learning (SEAL'02), IEEE press, Piscataway, pp. 370-374.
  36. Esmin, A. A. A. , Pereira, D. L. , and de Araujo, F. 2008. "Study of different approach to clustering data by using particle swarm optimization algorithm", In IEEE congress on evolutionary computation, CEC 2008, pp. 1817-1822.
  37. Sokal, R. R. 1977. "Clustering and classification: Background and current directions", Classification and clustering, Academic press, pp. 155-172.
  38. Mardia, K. V. , Kent, J. T. , and Bibby, J. M. 1979. "Multivariate analysis", Academic press.
  39. Seber, G. A. F. 1984. "Multivariate observations", Wiley.
  40. Mielke, P. W. 1985. "Geometric concerns pertaining to applications of statistical tests in the atmospheric sciences", Journal of Atmospheric Sciences, 42, pp. 1209-1212.
  41. Krzanowski, W. J. 1988. "Principles of multivariate analysis: A user's perspective", Oxford science publications.
  42. Mimmack, Gillian M. , Mason, Simon J. , Galpin, and Jacquelin S. 2001. "Choice of distance matrices in cluster analysis: Defining regions", Journal of climate, 4(12), pp. 2790-2797.
  43. Ertoz, L. , Steinbach, M. , and Kumar, V. 2003. "Finding clusters of different sizes, shapes, densities in noisy high dimensional data", Proceedings of the third SIAM international conference on data mining (SDM 2003), volume 112, Proceedings in Applied mathematics, Society for industrial and applied mathematics.
  44. Berry, M. J. A. , and Linoff, G. S. 2009. "Data mining techniques: For marketing, sales and customer relationship management", Second edition, Wiley.
  45. Bock, R. K. , and Krischer, W. 1998. "The data analysis brief book", New York: Springer-Verlag.
  46. Omran, M. G. H. 2005. "A PSO-based clustering algorithm with application to unsupervised classification", University of Pretoria etd.
  47. Shi, Y. , and Eberhart, R. C. 2002. "Empirical study of particle swarm optimization", In Proceedings of IEEE congress on evolutionary computation (CEC 1999), Washington D. C. , pp. 1945-1949.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Data Clustering Bio-inspired Algorithm Particle Swarm Optimization Distance Measures