CFP last date
20 May 2024
Reseach Article

Article:Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms

by G.Sangeetha, Sornamaheswari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 10 - Number 4
Year of Publication: 2010
Authors: G.Sangeetha, Sornamaheswari
10.5120/1468-1983

G.Sangeetha, Sornamaheswari . Article:Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms. International Journal of Computer Applications. 10, 4 ( November 2010), 28-34. DOI=10.5120/1468-1983

@article{ 10.5120/1468-1983,
author = { G.Sangeetha, Sornamaheswari },
title = { Article:Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { November 2010 },
volume = { 10 },
number = { 4 },
month = { November },
year = { 2010 },
issn = { 0975-8887 },
pages = { 28-34 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume10/number4/1468-1983/ },
doi = { 10.5120/1468-1983 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:58:53.013177+05:30
%A G.Sangeetha
%A Sornamaheswari
%T Article:Density Conscious Subspace Clustering for High Dimensional Data using Genetic Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 10
%N 4
%P 28-34
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering has been recognized as an important and valuable capability in the data mining field. Instead of finding clusters in the full feature space, subspace clustering is an emergent task which aims at detecting clusters embedded in subspaces. Most of previous works in the literature are density-based approaches, where a cluster is regarded as a high-density region in a subspace. However, the identification of dense regions in previous works lacks of considering a critical problem, called “the density divergence problem” in this thesis, which refers to the phenomenon that the region densities vary in different subspace cardinalities. Without considering this problem, previous works utilize a density threshold to discover the dense regions in all subspaces, which incurs the serious loss of clustering accuracy (either recall or precision of the resulting clusters) in different subspace cardinalities. To tackle the density divergence problem, in this thesis, we devise a novel subspace clustering model to discover the clusters based on the relative region densities in the subspaces, where the clusters are regarded as regions whose densities are relatively high as compared to the region densities in a subspace. Based on this idea, different density thresholds are adaptively determined to discover the clusters in different subspace cardinalities. Due to the infeasibility of applying previous techniques in this novel clustering model, we also devise an innovative algorithm, referred to as DENCOS (DENsity Conscious Subspace clustering), to adopt a divide-and-conquer scheme to efficiently discover clusters satisfying different density thresholds in different subspace cardinalities. Another approach for subspace clustering in high dimensional data is proposed using Genetic Approach. The GAs work with a population of individuals representing abstract representations of feasible solutions. Each individual is assigned a fitness that is a measure of how good solution it represents. The better the solution is, the higher the fitness value it gets. The population evolves towards better solutions. The evolution starts from a population of completely random individuals and iterates in generations. In each generation, the fitness of each individual is evaluated. Individuals are stochastically selected from a current population (based on their fitness), and modified by means of operators mutation and crossover to form a new population. It is capable of optimizing the number of clusters for tasks with well formed and separated clusters. As validated by our extensive experiments on retail data set, GENETIC can discover the clusters in all subspaces with high quality, and the efficiency of GENETIC outperforms previous works using DENCOS.

References
  1. C.C. Aggarwal, A. Hinneburg, and D. Keim, “On the Surprising Behavior of Distance Metrics in High Dimensional Space,” Proc. Eighth Int’l Conf. Database Theory (ICDT), 2001.
  2. I. Assent, R. Krieger, E. Muller, and T. Seidl, “DUSC: Dimensionality Unbiased Subspace Clustering,” Proc. IEEE Int’l Conf. Data Mining (ICDM), 2007.
  3. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” Proc. Second Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD), 1996.
  4. H. Fang, C. Zhai, L. Liu, and J. Yang, “Subspace Clustering for Microarray Data Analysis: Multiple Criteria and Significance,” Proc. IEEE Computational Systems Bioinformatics Conf., 2004.
  5. G. Moise, J. Sander, and M. Ester, “P3C: A Robust Projected Clustering Algorithm,” Proc. Sixth IEEE Int’l Conf. Data Mining (ICDM), 2006.
  6. L. Parsons, E. Haque, and H. Liu, “Subspace Clustering for High Dimensional Data: A Review,” ACM SIGKDD Explorations Newsletter, vol. 6, pp. 90-105, 2004.
  7. H.S. Nagesh, S. Goil, and A. Choudhary, “Adaptive Grids for Clustering Massive Data Sets,” Proc. First SIAM Int’l Conf. Data Mining (SDM), 2001.
  8. H.-P. Kriegel, P. Kroger, M. Renz, and S. Wurst, “A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data,” Proc. Fifth IEEE Int’l Conf. Data Mining (ICDM), 2005.
  9. Y.B. Kim, J.H. Oh, and J. Gao, “Emerging Pattern Based Subspace Clustering of Microarray Gene Expression DataUsing Mixture Models,” Proc. Int’l Conf. Bioinformatics and Its Applications (ICBA), 2004
Index Terms

Computer Science
Information Sciences

Keywords

Density Conscious High Dimensional Genetic Algorithms