CFP last date
20 May 2024
Reseach Article

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

by Sunita Jahirabadkar, Parag Kulkarni
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 63 - Number 20
Year of Publication: 2013
Authors: Sunita Jahirabadkar, Parag Kulkarni
10.5120/10584-5732

Sunita Jahirabadkar, Parag Kulkarni . Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms. International Journal of Computer Applications. 63, 20 ( February 2013), 29-35. DOI=10.5120/10584-5732

@article{ 10.5120/10584-5732,
author = { Sunita Jahirabadkar, Parag Kulkarni },
title = { Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 63 },
number = { 20 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 29-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume63/number20/10584-5732/ },
doi = { 10.5120/10584-5732 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:14:53.783339+05:30
%A Sunita Jahirabadkar
%A Parag Kulkarni
%T Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 63
%N 20
%P 29-35
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Finding clusters in high dimensional data is a challenging task as the high dimensional data comprises hundreds of attributes. Subspace clustering is an evolving methodology which, instead of finding clusters in the entire feature space, it aims at finding clusters in various overlapping or non-overlapping subspaces of the high dimensional dataset. Density based subspace clustering algorithms treat clusters as the dense regions compared to noise or border regions. Many momentous density based subspace clustering algorithms exist in the literature. Each of them is characterized by different characteristics caused by different assumptions, input parameters or by the use of different techniques etc. Hence it is quite unfeasible for the future developers to compare all these algorithms using one common scale. In this paper, we presented a review of various density based subspace clustering algorithms together with a comparative chart focusing on their distinguishing characteristics such as overlapping / non-overlapping, axis parallel / arbitrarily oriented and so on.

References
  1. L. Kaufman, and P. J. Rousseeuw (1990) Finding groups in data: An introduction to cluster analysis. John Wiley and Sons, New York.
  2. J. Daxin, C. Tang and A. Zhang (2004) Cluster analysis for Gene expression data: A survey, IEEE Transaction on Knowledge and Data Engineering, Vol. 16 Issue 11, pp. 1370-1386.
  3. R. Agrawal, J. Gehrke, D. Gunopulos and Raghavan (1998) Automatic subspace clustering of high dimensional data for data mining applications, In Proceedings of the SIGMOD, Vol. 27 Issue 2, pp. 94-105.
  4. M. Steinbach, L. Ertöz and V. Kumar, "The challenges of clustering high dimensional data", [online] available : http://www. users. cs. umn. edu/~kumar/papers/high_dim_clustering_19. pdf
  5. J. Gao, P. W. Kwan and Y. Guo (2009) Robust multivariate L1 principal component analysis and dimensionality reduction, Neurocomputing, Vol. 72: 1242-1249.
  6. A. Jain and R. Dubes (1988) Algorithms for clustering data, Prentice Hall, Englewood Cliffs, NJ.
  7. K. Fukunaga, (1990) Introduction to statistical pattern recognition, Academic Press, New York.
  8. G. Strang (1986) Linear algebra and its applications. Harcourt Brace Jovanovich, third edition.
  9. A. Blum and P. Langley (1997) Selection of relevant features and examples in machine learning, Artificial Intelligence, Vol. 97:245–271.
  10. H. Liu and H. Motoda (1998), Feature selection for knowledge discovery & data mining, Boston: Kluwer Academic Publishers.
  11. J. M. Pena, J. A. Lozano, P. Larranaga and Inza, I. (2001) Dimensionality reduction in unsupervised learning of conditional gaussian networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23(6):590 - 603.
  12. L. Yu and H. Liu, (2003), Feature selection for high dimensional data: A fast correlation based filter solution, In Proceedings of the Twentieth International Conference on Machine Learning, pp. 856-863.
  13. J. Friedman (1994) An overview of computational learning and function approximation, In: From Statistics to Neural Networks. Theory and Pattern Recognition Applications. (Cherkassky, Friedman, Wechsler, eds. ) Springer-Verlag 1
  14. M. Ester, H. -P. Kriegel, J. Sander and X. Xu (1996) A Density-based algorithm for discovering clusters in large spatial databases with noise, In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR. , pp. 226-231.
  15. G. Sheikholeslami, S. Chatterjee and A. Zhang "Wavecluster: A multi-resolution clustering approach for very large spatial databases," In Proceedings of the 24th VLDB Conference (1998).
  16. A. Hinneburg and D. A. Keim, "An efficient approach to clustering in large multimedia databases with noise," Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp. 58-65 (1998).
  17. K. Kailing, H. P. Kriegel and P. Kroger (2004) Density-connected subspace clustering for high dimensional data, In Proceedings of the 4th SIAM International Conference on Data Mining, Orlando, FL, pp. 46-257.
  18. A. Patrikainen, and M. Meila (2006) Comparing subspace clusterings, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, Issue 7, pp. 902-916.
  19. H. P. Kriegel, P. Kroger and A. Zimek, (2009) Clustering high-dimensional data : A survey on subspace clustering, Pattern-Based Clustering, and Correlation Clustering. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 3, Issue 1, Article 1.
  20. M. R. Ilango and V. Mohan, (2010) A survey of grid based clustering algorithms, International Journal of Engineering Science and Technology, Vol. 2(8), 3441-3446.
  21. P. Lance, E. Haque, and H. Liu (2004) Subspace clustering for high dimensional data: A review, ACM SIGKDD Explorations Newsletter, Vol. 6 Issue 1, pp 90–105.
  22. Technical Report CPDC-TR-9906-010 (1999) MAFIA: Efficient and scalable subspace clustering for very large data sets, Goil, S. , Nagesh, H. and Choudhary, A. , Northwestern University.
  23. C. Procopiuc, M. Jones, P. K. Agarwal and T. M. Murali, (2002) A monte carlo algorithm for fast projective clustering, In Proceedings of the 2002 ACM SIGMOD International conference on Management of data, pp. 418-427.
  24. C. C. Aggarwal, J. L. Wolf, P. Yu, C. Procopiuc, and J. S. Park (1999) Fast algorithms for projected clustering, In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 61-72.
  25. C. Bohm, K. Kailing, H. P. Kriegel, and P. Kroger, (2004) Density connected clustering with local subspace preferences, In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM-04), Washington DC, USA, pp. 27-34.
  26. H. P. Kriegel, P. Kroger, M. Renz, and S. Wurst (2005) A generic framework for efficient subspace clustering of high dimensional data, In Proceedings of the 5th International Conference on Data Mining (ICDM), Houston, TX, pp. 250-257.
  27. E. Müller, S. Günnemann, I. Assent and T. Seidl (2009) Evaluating clustering in subspace projections of high dimensional data, In Proc. of the Very Large Data Bases Endowment, Volume 2 issue 1, pp. 1270-1281.
  28. E. Achtert, C. Bohm, H. P. Kriegel, P. Kroger, I. Muller and A. Zimek 2007. Detection and visualization of subspace cluster hierarchies. In Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA).
  29. M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander 1999. OPTICS: Ordering points to identify the clustering structure. In Proceedings of the ACM International Conference on Management of Data (SIGMOD).
  30. R. Agrawal and R. Srikant, (1994) Fast algorithms for mining association rules. In: Proc. SIGMOD
  31. A. Hinneburg and D. A. Keim, "Optimal grid clustering: Towards breaking the curse of dimensionality in high dimensional clustering," In Proceedings of 25th International Conference on Very Large Data Bases (VLDB-1999), pp. 506-517, Edinburgh, Scotland, September, 1999, Morgan Kaufmann (1999).
  32. I. Assent, R. Krieger, E. Muller, and T. Seidl, (2007) DUSC: Dimensionality Unbiased Subspace Clustering. In Proc. IEEE Intl. Conf. on Data Mining (ICDM 2007), Omaha, Nebraska, pp 409-414.
  33. I. Assent, R. Krieger, E. Müller, and T. Seidl (2008) INSCY: Indexing subspace clusters with in process removal of redundancy", Eighth IEEE International Conference on Data Mining In ICDM, pp. 414–425
  34. Y. H. Chu, J. W. Huang, K. T. Chuang, D. N. Yang and M. S. Chen. (2010) Density conscious subspace clustering for high dimensional data. IEEE Trans. Knowledge Data Eng. 22: 16-30.
  35. Muller, E. , Assesnt, I. , Gunnemann, S. and Seidl, T. (2011) Scalable Density based Subspace Clustering. Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM'11), pp: 1076-1086.
Index Terms

Computer Science
Information Sciences

Keywords

Density based clustering High dimensional data Subspace clustering