Call for Paper - October 2019 Edition
IJCA solicits original research papers for the October 2019 Edition. Last date of manuscript submission is September 20, 2019. Read More

Mind Map based Survey of Conventional and Recent Clustering Algorithms: Learning’s for Development of Parallel and Distributed Clustering Algorithms

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Rahul Joshi, Preeti Mulay
10.5120/ijca2018917487

Rahul Joshi and Preeti Mulay. Mind Map based Survey of Conventional and Recent Clustering Algorithms: Learning’s for Development of Parallel and Distributed Clustering Algorithms. International Journal of Computer Applications 181(4):14-21, July 2018. BibTeX

@article{10.5120/ijca2018917487,
	author = {Rahul Joshi and Preeti Mulay},
	title = {Mind Map based Survey of Conventional and Recent Clustering Algorithms: Learning’s for Development of Parallel and Distributed Clustering Algorithms},
	journal = {International Journal of Computer Applications},
	issue_date = {July 2018},
	volume = {181},
	number = {4},
	month = {Jul},
	year = {2018},
	issn = {0975-8887},
	pages = {14-21},
	numpages = {8},
	url = {http://www.ijcaonline.org/archives/volume181/number4/29703-2018917487},
	doi = {10.5120/ijca2018917487},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Till date, different papers are available on survey of clustering algorithms. The novel approach used in this paper is use of Mind Maps to present key details about clustering algorithms in visual form. This paper spans from Mind Maps for basic clustering process, similarity and distance indices, evaluation indices, conventional clustering algorithms, recent clustering algorithms, recent parallel and distributed clustering algorithms and key learning’s about development of parallel and distributed clustering algorithms.

References

  1. Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle River.
  2. Shinde, K., & Mulay, P. (2017, April). Cbica: Correlation based incremental clustering algorithm, a new approach. In Convergence in Technology (I2CT), 2017 2nd International Conference for (pp. 291-296). IEEE.
  3. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678.
  4. Estivill-Castro V (2002) Why so many clustering algorithms: a position paper. ACMSIGKDD Explor Newsl 4:65–75.
  5. Färber I, Günnemann S, Kriegel H, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010) On using class-labels in evaluation of clusterings. In MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings held in conjunction with KDD, Washington, DC.
  6. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkeley Symp Math Stat Probab 1:281–297.
  7. Park H, Jun C (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36:3336–3341.
  8. Kaufman L, Rousseeuw P (1990) Partitioning around medoids (program pam). Finding groups in data: an introduction to cluster analysis. Wiley, Hoboken.
  9. Kaufman L, Rousseeuw P (2008) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, Hoboken. doi:10.1002/9780470316801.
  10. Ng R, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14:1003–1016.
  11. Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32:241–254.
  12. Zhang T, Ramakrishnan R, Livny M(1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25:103–104.
  13. Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for large databases. ACM SIGMOD Rec 27:73–84
  14. Guha S, Rastogi R, Shim K (1999) ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the 15th international conference on data engineering, pp 512-521.
  15. Karypis G, Han E, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32:68–75.
  16. Bezdek J, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10:191–203.
  17. Dave R, Bhaswan K (1992) Adaptive fuzzy c-shells clustering and detection of ellipses. IEEE Trans Neural Netw 3:643–662.
  18. Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24:1279–1284.
  19. Xu X, Ester M, Kriegel H, Sander J (1998) A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the fourteenth international conference on data engineering, pp 324-331.
  20. Rasmussen C (1999) The infinite Gaussian mixture model. Adv Neural Inf Process Syst 12:554–560.
  21. Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second ACM SIGKDD international conference on knowledge discovery and data mining, pp 226–231.
  22. Ankerst M, Breunig M, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings on 1999 ACMSIGMOD international conference on management of data, vol 28, pp 49–60
  23. Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24:603–619.
  24. Sharan R, Shamir R (2000) CLICK: a clustering algorithm with applications to gene expression analysis. In: Proc international conference intelligent systems molecular biolgy, pp 307–316.
  25. Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACMComput Surv (CSUR) 31:264–323.
  26. Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In VLDB, pp 186–195.
  27. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings 1998 ACM sigmod international conference on management of data, vol 27, pp 94–105.
  28. Barbará D, Chen P (2000) Using the fractal dimension to cluster datasets. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 260–264.
  29. Fisher D(1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:139–172.
  30. KohonenKohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480.
  31. Carpenter G, Grossberg S (1988) The ART of adaptive pattern recognition by a self-organizing neural network. Computer 21:77–88.
  32. Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319.
  33. MacDonald D, Fyfe C (2000) The kernel self-organising map. Proc Fourth Int Conf Knowl-Based Intell Eng Syst Allied Technol 1:317–320.
  34. Wu Z, Xie W,Yu J (2003) Fuzzy c-means clustering algorithm based on kernel method. In: Proceedings of the fifth ICCIMA, pp 49–54.
  35. Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2:125–137.
  36. Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Advances in neural information processing systems, pp 1537–1544.
  37. Zhao B, Kwok J, Zhang C (2009) Multiple kernel clustering. In SDM, pp 638–649.
  38. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617.
  39. Topchy A, Jain A, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining, pp 379.
  40. Topchy A, Jain A, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27:1866–1881.
  41. Handl J, Meyer B (2007) Ant-based and swarm-based clustering. Swarm Intell 1:95–113.
  42. Van der Merwe D, Engelbrecht A (2003) Data clustering using particle swarm optimization. Congr Evol Comput 1:215–220.
  43. Amiri B, Fathian M, Maroosi A (2009) Application of shuffled frog-leaping algorithm on clustering. Int J Adv Manuf Technol 45:199–209.
  44. Karaboga D, Ozturk C (2011) A novel clustering approach: artificial bee colony (ABC) algorithm. Appl Soft Comput 11:652–657.
  45. Horn D, Gottlieb A (2001) The method of quantum clustering. In: Advances in neural information processing systems, pp 769–776.
  46. Weinstein M, Horn D (2009) Dynamic quantum clustering: a method for visual exploration of structures in data. Phys Rev E 80:066117.
  47. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22:888–905.
  48. Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856.
  49. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976.
  50. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344:1492–1496.
  51. Sheikholeslami G, Chatterjee S, Zhang A(1998) Wavecluster: Amulti-resolution clustering approach for very large spatial databases. In: VLDB, pp 428–439.
  52. O’callaghan L, Meyerson A, Motwani R, Mishra N, Guha S (2002) Streaming-data algorithms for high-quality clustering. In: ICDE, p 0685.
  53. Aggarwal C, Han J, Wang J, Yu P (2003) A framework for clustering evolving data streams. In: VLDB, pp 81–92.
  54. Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: VLDB, pp 852–863.
  55. Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. SDM 6:328–339.
  56. Hinneburg A, Keim D (1998) An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the 4th ACM SIGKDD international conference on knowledge discovery and data mining 98: 58–65.
  57. Garg, A., Mangla, A., Gupta, N., & Bhatnagar, V. (2006, December). PBIRCH: A scalable parallel clustering algorithm for incremental data. In Database Engineering and Applications Symposium, 2006. IDEAS'06. 10th International (pp. 315-316). IEEE.
  58. Lorbeer, B., Kosareva, A., Deva, B., Softić, D., Ruppel, P., & Küpper, A. (2017). Variations on the Clustering Algorithm BIRCH. Big Data Research.
  59. He, Y., Tan, H., Luo, W., Mao, H., Ma, D., Feng, S., & Fan, J. (2011, December). Mr-dbscan: an efficient parallel density-based clustering algorithm using mapreduce. In Parallel and Distributed Systems (ICPADS), 2011 IEEE 17th International Conference on (pp. 473-480). IEEE.
  60. Patwary, M. A., Palsetia, D., Agrawal, A., Liao, W. K., Manne, F., & Choudhary, A. (2012, November). A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (p. 62). IEEE Computer Society Press.
  61. Hu, X., Huang, J., & Qiu, M. (2017, November). A Communication Efficient Parallel DBSCAN Algorithm based on Parameter Server. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 2107-2110). ACM.
  62. Götz, M., Bodenstein, C., & Riedel, M. (2015, November). HPDBSCAN: highly parallel DBSCAN. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments (p. 2). ACM.
  63. Lulli, A., Dell'Amico, M., Michiardi, P., & Ricci, L. (2016). NG-DBSCAN: scalable density-based clustering for arbitrary data. Proceedings of the VLDB Endowment, 10(3), 157-168.
  64. He, Y., Tan, H., Luo, W., Feng, S., & Fan, J. (2014). MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Frontiers of Computer Science, 8(1), 83-99.
  65. Andrade, G., Ramos, G., Madeira, D., Sachetto, R., Ferreira, R., & Rocha, L. (2013). G-dbscan: A gpu accelerated algorithm for density-based clustering. Procedia Computer Science, 18, 369-378.
  66. Merk, A., Cal, P., & Woźniak, M. (2017, May). Distributed DBSCAN Algorithm–Concept and Experimental Evaluation. In International Conference on Computer Recognition Systems (pp. 472-480). Springer, Cham.
  67. Yıldırım, A. A., & Özdoğan, C. (2011). Parallel WaveCluster: A linear scaling parallel clustering algorithm implementation with application to very large datasets. Journal of Parallel and Distributed Computing, 71(7), 955-962.
  68. Yıldırım, A. A., & Özdoğan, C. (2011). Parallel wavelet-based clustering algorithm on GPUs using CUDA. Procedia Computer Science, 3, 396-400.
  69. Anggraini, E. L., Suciati, N., & Suadi, W. (2013, June). Parallel computing of WaveCluster algorithm for face recognition application. In QiR (Quality in Research), 2013 International Conference on (pp. 56-59). IEEE.
  70. Hadjidoukas, P. E., & Amsaleg, L. (2008). Parallelization of a hierarchical data clustering algorithm using openmp. In OpenMP Shared Memory Parallel Programming (pp. 289-299). Springer, Berlin, Heidelberg.
  71. Lathiya, P., & Rani, R. (2016, August). Improved CURE clustering for big data using Hadoop and Mapreduce. In Inventive Computation Technologies (ICICT), International Conference on (Vol. 3, pp. 1-5). IEEE.
  72. Maitrey, S., Jha, C. K., Gupta, R., & Singh, J. (2012). Enhancement of CURE clustering technique in data mining. International Journal of Computer Applications.
  73. Jakovits, P., & Srirama, S. N. (2013, September). Clustering on the cloud: Reducing clara to mapreduce. In Proceedings of the Second Nordic Symposium on Cloud Computing & Internet Technologies (pp. 64-71). ACM.
  74. Wu, J., & Hong, B. (2011, May). An efficient k-means algorithm on CUDA. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on (pp. 1740-1749). IEEE.
  75. Zhang, J., Wu, G., Hu, X., Li, S., & Hao, S. (2011, December). A parallel k-means clustering algorithm with mpi. In Parallel Architectures, Algorithms and Programming (PAAP), 2011 Fourth International Symposium on (pp. 60-64). IEEE.
  76. Wang, B., Yin, J., Hua, Q., Wu, Z., & Cao, J. (2016, August). Parallelizing k-means-based clustering on spark. In Advanced Cloud and Big Data (CBD), 2016 International Conference on (pp. 31-36). IEEE.
  77. Mao, Y., Xu, Z., Li, X., & Ping, P. (2015, August). An optimal distributed K-Means clustering algorithm based on cloudstack. In Information and Automation, 2015 IEEE International Conference on (pp. 3149-3156). IEEE.
  78. Jin, S., Cui, Y., & Yu, C. (2016). A New Parallelization Method for K-means. arXiv preprint arXiv:1608.06347.
  79. https://coggle.it/

Keywords

Mind Map; Clustering; Learning; Parallel; Distributed; Algorithm etc.