CFP last date
20 May 2024
Reseach Article

Choosing DBSCAN Parameters Automatically using Differential Evolution

by Amin Karami, Ronnie Johansson
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 91 - Number 7
Year of Publication: 2014
Authors: Amin Karami, Ronnie Johansson
10.5120/15890-5059

Amin Karami, Ronnie Johansson . Choosing DBSCAN Parameters Automatically using Differential Evolution. International Journal of Computer Applications. 91, 7 ( April 2014), 1-11. DOI=10.5120/15890-5059

@article{ 10.5120/15890-5059,
author = { Amin Karami, Ronnie Johansson },
title = { Choosing DBSCAN Parameters Automatically using Differential Evolution },
journal = { International Journal of Computer Applications },
issue_date = { April 2014 },
volume = { 91 },
number = { 7 },
month = { April },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-11 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume91/number7/15890-5059/ },
doi = { 10.5120/15890-5059 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:12:06.915796+05:30
%A Amin Karami
%A Ronnie Johansson
%T Choosing DBSCAN Parameters Automatically using Differential Evolution
%J International Journal of Computer Applications
%@ 0975-8887
%V 91
%N 7
%P 1-11
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Over the last several years, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) has been widely applied in many areas of science due to its simplicity, robustness against noise (outlier) and ability to discover clusters of arbitrary shapes. However, DBSCAN algorithm requires two initial input parameters, namely Eps (the radius of the cluster) and MinPts (the minimum data objects required inside the cluster) which both have a significant influence on the clustering results. Hence, DBSCAN is sensitive to its input parameters and it is hard to determine them a priori. This paper presents an efficient and effective hybrid clustering method, named BDE-DBSCAN, that combines Binary Differential Evolution and DBSCAN algorithm to simultaneously quickly and automatically specify appropriate parameter values for Eps and MinPts. Since the Eps parameter can largely degrades the efficiency of the DBSCAN algorithm, the combination of an analytical way for estimating Eps and Tournament Selection (TS) method is also employed. Experimental results indicate the proposed method is precise in determining appropriate input parameters of DBSCAN algorithm.

References
  1. Krista Rizman ? Zalik. An efficient ´k-means clustering algorithm. Pattern Recognitoin Letters, 29(9):1385–1391, 2008.
  2. Tran Manh Thang and Juntae Kim. The anomaly detection by using dbscan clustering with multiple parameters. In International Conference on Information Science and Applications (ICISA), pages 1–5, 2011.
  3. M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pages 226–231, 1996.
  4. T. Ali, S. Asghar, and N. A. Sajid. Critical analysis of dbscan variations. In International Conference on Information and Emerging Technologies (ICIET), pages 1–6, 2010.
  5. Wu Ying, Yang Kai, and Zhang Jianzhong. Using dbscan clustering algorithm in spam identifying. In 2nd International Conference on Education Technology and Computer (ICETC), volume 1, pages V1–398–V1–402, 2010.
  6. Derya Birant and Alp Kut. St-dbscan: An algorithm for clustering spatialtemporal data. Data & Knowledge Engineering, 60(1):208–221, 2007.
  7. P. Viswanath and V. Suresh Babu. Rough-dbscan: A fast hybrid density based clustering method for large data sets. Pattern Recognition Letters, 30(16):1477–1488, 2009.
  8. Amin Karami. Data clustering for anomaly detection in content-centric networks. International Journal of Computer Applications, 81(7):1–8, November 2013. Published by Foundation of Computer Science, New York, USA.
  9. Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, and Leonardo Rocha. G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science, 18:369–378, 2013.
  10. G. H. Shah. An improved dbscan, a density based clustering algorithm with parameter selection for high dimensional data sets. In Nirma University International Engineering (NUiCONE), pages 1–6, 2012.
  11. Thanh N. Tran, Klaudia Drab, and Michal Daszykowski. Revised dbscan algorithm to cluster data with dense adjacent clusters. Chemometrics and Intelligent Laboratory Systems, 120:92–96, 2013.
  12. Hua Jiang, Jing Li, Shenghe Yi, Xiangyang Wang, and Xin Hu. A new hybrid method based on partitioning-based dbscan and ant clustering. Expert Systems with Applications, 38(8), 2011.
  13. Chen Xiaoyun, Min Yufang, Zhao Yan, and Wang Ping. Gmdbscan: Multi-density dbscan cluster based on grid. In IEEE International Conference on e-Business Engineering (ICEBE '08), pages 780–783, 2008.
  14. Li Xue-yong, Gao Guo-hong, and Sun Jia-xia. A new intrusion detection method based on improved dbscan. In International Conference on Information Engineering (ICIE), volume 2, pages 117–120, 2010.
  15. A. Tepwankul and S. Maneewongwattana. U-dbscan : A density-based clustering algorithm for uncertain objects. In 26th IEEE International Conference on Data Engineering Workshops (ICDEW), pages 136–143, 2010.
  16. Damodar Reddy Edla and Prasanta K. Jana. A prototypebased modified dbscan for gene clustering. Procedia Technology, 6:485–492, 2012.
  17. Huang Darong and Wang Peng. Grid-based dbscan algorithm with referential parameters. Physics Procedia, 24, Part B:1166–1170, 2012.
  18. A. Smiti and Z. Elouedi. Dbscan-gm: An improved clustering method based on gaussian means and dbscan techniques. In 16th International Conference on Intelligent Engineering Systems (INES), pages 573–578, 2012.
  19. G. P. Babu and M. N. Murty. Simulated annealing for selecting optimal initial seeds in the k-means algorithm. Indian Journal of Pure and Applied Mathematics, 25(1-2):85– 94, 1994.
  20. Amin Karami and Manel Guerrero-Zapata. A fuzzy anomaly detection system based on hybrid pso-kmeans algorithm in content-centric networks. Neurocomputing, 2014.
  21. George E. Tsekouras. A simple and effective algorithm for implementing particle swarm optimization in rbf network's design using input-output fuzzy clustering. Neurocomputing, 108:36–44, 2013.
  22. Junyan Chen. Hybrid clustering algorithm based on pso with the multidimensional asynchronism and stochastic disturbance method. Journal of Theoretical and Applied Information Technology, 46(1):434–440, 2012.
  23. Zhenkui Pei, Xia Hua, and Jinfeng Han. The clustering algorithm based on particle swarm optimization algorithm. In Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation - Volume 01, ICICTA '08, pages 148–151, Washington, DC, USA, 2008. IEEE Computer Society.
  24. Li Lin and Liu Suhua. Wheat cultivar classifications based on tabu search and fuzzy c-means clustering algorithm. In 4th International Conference on Computational and Information Sciences (ICCIS), pages 493–496, 2012.
  25. Hong bing Xu, Hou jun Wang, and Chun-Guang Li. Fuzzy tabu search method for the clustering problem. In Proceedings International Conference on Machine Learning and Cybernetics, volume 2, pages 876–880, 2002.
  26. Yangyang Li, Jing Chen, Ruochen Liu, and Jianshe Wu. A spectral clustering-based adaptive hybrid multi-objective harmony search algorithm for community detection. In IEEE Congress on Evolutionary Computation (CEC), pages 1–8, 2012.
  27. C. Cobos, J. Andrade, W. Constain, M. Mendoza, and E. Le´on. Web document clustering based on global-best harmony search, k-means, frequent term sets and bayesian information criterion. In IEEE Congress on Evolutionary Computation (CEC), pages 1–8, 2010.
  28. Liang Li, Shi bao Lu, Xue song Chu, and Guang ming Yu. A combinatorial search method based on harmony search algorithm and particle swarm optimization in slope stability analysis. In International Conference on Computational Intelligence and Software Engineering (CiSE), pages 1–4, 2009.
  29. E. Hancer, C. Ozturk, and D. Karaboga. Artificial bee colony based image clustering method. In IEEE Congress on Evolutionary Computation (CEC), pages 1–5, 2012.
  30. D. Karaboga, S. Okdem, and C. Ozturk. Cluster based wireless sensor network routings using artificial bee colony algorithm. In International Conference on Autonomous and Intelligent Systems (AIS), pages 1–5, 2010.
  31. Y. Marinakis, M. Marinaki, and N. Matsatsinis. A hybrid discrete artificial bee colony - grasp algorithm for clustering. In International Conference on Computers Industrial Engineering (CIE), pages 548–553, 2009.
  32. Bo Zhao, Zhongxiang Zhu, Enrong Mao, and Zhenghe Song. Image segmentation based on ant colony optimization and kmeans clustering. In IEEE International Conference on Automation and Logistics, pages 459–463, 2007.
  33. Yanfang Han and Pengfei Shi. An improved ant colony algorithm for fuzzy clustering in image segmentation. Neurocomputing, 70(46):665–671, 2007.
  34. J. W. Han and M. Kamber. Data Mining Concepts and Techniques, second edition. Morgan Kaufmann Publishers, 2006.
  35. Stanisaw Sieniutycz and Jacek Jeowski. 1 - brief review of static optimization methods. In Energy Optimization in Process Systems and Fuel Cells (Second Edition), pages 1–43. Elsevier, Amsterdam, second edition edition, 2013.
  36. Yang Sun, Lingbo Zhang, and Xingsheng Gu. A hybrid coevolutionary cultural algorithm based on particle swarm optimization for solving global optimization problems. Neurocomputing, 98(3):76–89, 2012.
  37. Hongfang Zhou, Peng Wang, and Hongyan Li. Research on adaptive parameters determination in dbscan algorithm. Journal of Information & Computational Science, 9(7):1967– 1973, 2012.
  38. Mohammed Azmi Al-Betar, Ahamad Tajudin Khader, Zong Woo Geem, Iyad Abu Doush, and Mohammed A. Awadallah. An analysis of selection methods in memory consideration for harmony search. Applied Mathematics and Computation, 219(22):10753–10767, 2013.
  39. R. Storn and K. Price. Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11:341–359, 1997.
  40. Dexuan Zou, JianhuaWu, Liqun Gao, and Steven Li. A modified differential evolution algorithm for unconstrained optimization problems. Neurocomputing, 120:469–481, 2013.
  41. Marco Locatelli, Mirko Maischberger, and Fabio Schoen. Differential evolution methods based on local searches. Computers & Operations Research, 43:169–180, 2014.
  42. Yang Lou, Junli Li, and Yongsheng Wang. A binarydifferential evolution algorithm based on ordering of individuals. In 6th International Conference on Natural Computation (ICNC), pages 2207–2211, 2010.
  43. Youyun Ao and Hongqin Chi. Experimental study on differential evolution strategies. In Global Congress on Intelligent Systems, pages 19–24, 2009.
  44. Ling Wang, Xiping Fu, Yunfei Mao, Muhammad Ilyas Menhas, and Minrui Fei. A novel modified binary differential evolution algorithm and its applications. Neurocomputing, 98:55–75, 2012.
  45. M. Daszykowski, B. Walczak, and D. L. Massart. Looking for natural patterns in data: Part 1. density-based approach. Chemometrics and Intelligent Laboratory Systems, 56:83–92, 2001.
  46. A. Gionis, H. Mannila, and P. Tsaparas. Clustering aggregation. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):1–30, 2007.
  47. C. T. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers, 100(1):68–86, 1971.
  48. H. Chang and D. Y. Yeung. Robust path-based spectral clustering. Pattern Recognition, 41(1):191–203, 2008.
  49. Sameh A. Salem and A. K. Nandi. New assessment criteria for clustering algorithms. In IEEE Workshop on Machine Learning for Signal Processing, pages 285–290, 2005.
  50. Anil K. Jain and Richard C. Dubes. Algorithms for clustering data. Prentice-Hall, Inc. , Upper Saddle River, NJ, USA, 1988.
  51. Amin Karami. Utilization and comparison of multi attribute decision making techniques to rank bayesian network options. master thesis, University of Sk¨ovde, Sk¨ovde, Sweden, 2011.
  52. Amin Karami and Ronnie Johansson. Utilization of multi attribute decision making techniques to integrate automatic and manual ranking of options. Journal of Information Science and Engineering, 30(2):519–534, 2014.
  53. David L. Davies and Donald W. Bouldin. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, 1979.
  54. J. C. Dunn. Well separated clusters and optimal fuzzy partitions. Cybernetics, 4:95–104, 1974. 11
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Analysis DBSCAN Differential Evolution Tournament Selection