CFP last date
20 June 2024
Reseach Article

Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis

by R.Indhumathi, Dr.S.Sathiyabama
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 11 - Number 8
Year of Publication: 2010
Authors: R.Indhumathi, Dr.S.Sathiyabama

R.Indhumathi, Dr.S.Sathiyabama . Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis. International Journal of Computer Applications. 11, 8 ( December 2010), 1-4. DOI=10.5120/1606-2158

@article{ 10.5120/1606-2158,
author = { R.Indhumathi, Dr.S.Sathiyabama },
title = { Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis },
journal = { International Journal of Computer Applications },
issue_date = { December 2010 },
volume = { 11 },
number = { 8 },
month = { December },
year = { 2010 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { },
doi = { 10.5120/1606-2158 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T19:59:59.900963+05:30
%A R.Indhumathi
%A Dr.S.Sathiyabama
%T Article:Reducing and Clustering high Dimensional Data through Principal Component Analysis
%J International Journal of Computer Applications
%@ 0975-8887
%V 11
%N 8
%P 1-4
%D 2010
%I Foundation of Computer Science (FCS), NY, USA

High dimensional data is phenomenon in real-world data mining applications. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. Usually k-means clustering algorithm is used but it results in time consuming, computationally expensive and the quality of the resulting clusters depends on the selection of initial centroid and the dimension of the data. The accuracy of the resultant value perhaps not up to the level of expectation when the dimension of the dataset is high because we cannot say that the dataset chosen are free from noisy and flawless. Hence to improve the efficiency and accuracy of mining task on high dimensional data, the data must be pre-processed by an efficient dimensionality reduction method. This paper proposes a method in which the high dimensional data is reduced through Principal Component Analysis and then bisecting k-means clustering is performed on the reduced data where there is no initialization of the centroids.

  1. Pang-Ning Tang, Michal Steinbach and Vipin Kumar, “ Introduction to Data Mining”, Pearson Education,Third edition, 2009.
  2. Chris Ding and Xiaofeng He, “K-Means Clustering via Principal Component Analysis”,In proceedings of the 21stInternational Conference on Machine Learning, Banff, Canada, 2004
  3. Sandro Saitta, Combining PCA and K-means March 26, 2007 by Filed under: PCA, k-means
  4. Chris Ding and Xiaofeng He ,K-means Clustering via Principal Component Analysis: Proceedings of the twenty-first international conference on Machine learning, Page: 29 ,Year of Publication: 2004
  5. Zhang Z., Zhang J. and Xue H.2008.Improved K-means clustering algorithm Proceedings of the congress on Image and signal Processing, Vol.5,n0.5,pp.162-172
  6. Principal component analysis From Wikipedia, the free encyclope
  7. I.T. Jolliffe. Principal Component Analysis. Springer, 2nd edition2002, ISBN 978-0-387-95442-4.
  8. Rajashree Dash,Debahuti Mishra,Amiya Kumar Rath,Milu Acharya ,A hybridized K- means clustering approach for high dimensional dataset, ,Inertnatioanl Journal of Engineering Science and Technology,Vol 2,No 2, 2010,pp,59-66.
  9. Merz C and Murphy P, UCI Repository of Machine Learning Databases.
  10. A Deterministic Method for Initializing K- Means Clustering, Ting Su,Jennifer Dy, Proceedings of the 16th IEEE International Conference on Tools with Artifical Intelligence,pp.784-786.
  11. Valarrnathie P.,Srinath M.and Dinakaran K., 2009.An Increased performance of Clustering high dimensional data through dimensionality reduction technique,Journal of Theoretical and Applied Information Technology,Vol 13,pp 271-273.
  12. Sergio M. Savaresi and Daniel L. Boley, On the performance of Bisecting K-Means and PDDP.
  13. N.Tajunisha and V.Saravanan,”An increased performance of clustering high dimensional data using Priniciapl Component Analysis, 2010 First International Conference on Integrated Intelligent Computing”DOI 10.11.09
  14. A k-Means-Based Projected Clustering Algorithm,Yufen Sun,Gang Liy and Kun Xu, 2010 Third International Joint Conference on Computational Science and Optimization, DOI 10.11.09
Index Terms

Computer Science
Information Sciences


Keywords K-means Dimensionality Reduction Principal Component Analysis