CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set

by D.Napoleon, S.Pavalakodi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 13 - Number 7
Year of Publication: 2011
Authors: D.Napoleon, S.Pavalakodi
10.5120/1789-2471

D.Napoleon, S.Pavalakodi . A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set. International Journal of Computer Applications. 13, 7 ( January 2011), 41-46. DOI=10.5120/1789-2471

@article{ 10.5120/1789-2471,
author = { D.Napoleon, S.Pavalakodi },
title = { A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set },
journal = { International Journal of Computer Applications },
issue_date = { January 2011 },
volume = { 13 },
number = { 7 },
month = { January },
year = { 2011 },
issn = { 0975-8887 },
pages = { 41-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume13/number7/1789-2471/ },
doi = { 10.5120/1789-2471 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:02:08.172849+05:30
%A D.Napoleon
%A S.Pavalakodi
%T A New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set
%J International Journal of Computer Applications
%@ 0975-8887
%V 13
%N 7
%P 41-46
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality that corresponds to the intrinsic dimensionality of the data. K-means clustering algorithm often does not work well for high dimension, hence, to improve the efficiency, apply PCA on original data set and obtain a reduced dataset containing possibly uncorrelated variables. In this paper principal component analysis and linear transformation is used for dimensionality reduction and initial centroid is computed, then it is applied to K-Means clustering algorithm.

References
  1. Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000).Constrained k-means clustering (Technical ReportMSR-TR-2000-65). Microsoft Research, Redmond, WA.
  2. C Ding,”Principal Component Analysis and Effective K-means Clustering”
  3. Chao Shi and Chen Lihui, 2005. Feature dimension reduction for microarray data analysis using locally linear embedding, 3rd Asia Pacific Bioinformatics Conference, pp. 211-217.
  4. Chris Ding and Xiaofeng He, “K-Means Clustering via Principal Component Analysis”, In proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004
  5. Davy Michael and Luz Saturnine, 2007. Dimensionality reduction for active learning with nearest neighbor classifier in text categorization problems, Sixth International Conference on Machine Learning and Applications, pp. 292-297
  6. IEEEI.T Jolliffe, “Principal Component Analysis”, Springer, second edition.
  7. Kiri Wagsta- Claire Cardie ,”Constrained K-means Clustering with Background Knowledge”
  8. .Maaten L.J.P., Postma E.O. and Herik H.J. van den, 2007. Dimensionality reduction: A comparative review”, Tech. rep.University of Maastricht.
  9. Moth’d Belal. Al-Daoud , (2005).A New Algorithm for Cluster Initialization, World Academy of Science, Engineering and Technology.
  10. O Shamir,”Model Selection and Stability in k-means Clustering”
  11. Rand, W. M. (1971). Objective criteria for the evaluation of clustering met hods. Journal of the AmericanStatistical Association, 66, 846-850.
  12. RM Suresh, K Dinakaran, P Valarmathie,“Model based modified k-means clustering for microarray data”,
  13. International Conference on Information Management and Engineering, Vol.13, pp 271-273, 2009, .Valarmathie P., Srinath M. and Dinakaran K., 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique, Journal of Theoretical and Applied Information Technology, Vol. 13, pp. 271-273
  14. Wagsta_, K., & Cardie, C. (2000). Clustering with instance-level constraints. Proceedings of the Seventeenth International Conference on Machine Learning (pp. 1103{1110). Palo Alto, CA: Morgan Kaufmann.
  15. Wray Buntine,” K-means Clustering and PCA”, National ICT Australia
  16. Xu R. and Wunsch D., 2005. Survey of clustering algorithms, IEEE Trans. Neural Networks, Vol. 16, No. 3, pp. 645-678.
  17. Yan Jun, Zhang Benyu, Liu Ning, Yan Shuicheng, Cheng Qiansheng, Fan Weiguo, Yang Qiang, Xi Wensi, and Chen Zheng,2006. Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing, IEEE transactions on Knowledge and Data Engineering, Vol. 18, No. 3, pp. 320-333.
  18. Yeung Ka Yee and Ruzzo Walter L., 2000. An empirical study on principal component analysis for clustering gene expressionData”,Tech. Report, University of Washington.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Dimensionality Reduction Principal component analysis k-means algorithm Amalgamation