CFP last date
20 March 2024
Reseach Article

New Method for Finding Initial Cluster Centroids in K-means Algorithm

by Harmanpreet Singh, Kamaljit Kaur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 74 - Number 6
Year of Publication: 2013
Authors: Harmanpreet Singh, Kamaljit Kaur

Harmanpreet Singh, Kamaljit Kaur . New Method for Finding Initial Cluster Centroids in K-means Algorithm. International Journal of Computer Applications. 74, 6 ( July 2013), 27-30. DOI=10.5120/12890-9837

@article{ 10.5120/12890-9837,
author = { Harmanpreet Singh, Kamaljit Kaur },
title = { New Method for Finding Initial Cluster Centroids in K-means Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 74 },
number = { 6 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 27-30 },
numpages = {9},
url = { },
doi = { 10.5120/12890-9837 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T21:41:32.404943+05:30
%A Harmanpreet Singh
%A Kamaljit Kaur
%T New Method for Finding Initial Cluster Centroids in K-means Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 74
%N 6
%P 27-30
%D 2013
%I Foundation of Computer Science (FCS), NY, USA

Data Mining is special field of computer science concerned with the automated extraction of patterns of knowledge implicitly stored in large databases, data warehouses and other large data repositories. Clustering is one of the Data Mining tasks which is used to cluster objects on the basis of their nearness to the central value. It is a method of grouping objects automatically. In clustering elements within same cluster are more similar than the elements in other clusters. K- Means is one the method of clustering which is used widely because it is simple and efficient. The output of the K Means depends upon the chosen central values for clustering. So accuracy of the K Means algorithm depends much on the chosen central values. The original K Means method chooses the initial cluster centroids randomly which affects its performance. This paper presents a new method for finding initial cluster centroids for K Means.

  1. Jiawei Han, Data mining: concepts and techniques (Morgan Kaufman Publishers, 2006).
  2. Margaret H Dunham, Data mining: introductory and advanced concepts (Pearson Education, 2006).
  3. Pena, J. M. , Lozano, J. A. , Larranaga, P, An empirical comparison of four initialization methods for the K-Means algorithm, Pattern Recognition Letters 20 (1999) pp. 1027-1040.
  4. Anderberg, M, Cluster analysis for applications (Academic Press, New York 1973).
  5. Tou, J. , Gonzales, Pattern Recognition Principles (Addison-Wesley, Reading, MA, 1974).
  6. Katsavounidis, I. , Kuo, C. , Zhang, Z. , A new initialization technique for generalized lloyd iteration, IEEE Signal Processing Letters 1 (10), 1994, pp. 144-146.
  7. Bradley, P. S. , Fayyad, Refining initial points for K-Means clustering: Proc. 15th International Conf. on Machine Learning, San Francisco, CA, 1998, pp. 91-99.
  8. Koheri Arai and Ali Ridho Barakbah, Hierarchical k-means: an algorithm for centroids initialization for k-means, Reports of The Faculty of Science and Engineering Saga University, vol. 36, No. 1, 2007.
  9. Samarjeet Borah, M. K. Ghose, Performance Analysis of AIM-K-means & K- means in Quality Cluster Generation, Journal of Computing, vol. 1, Issue 1, December 2009.
  10. Ye Yunming, Advances in knowledge discovery and data mining (Springer, 2006).
  11. K. A. Abdul Nazeer and M. P. Sebastian, Improving the accuracy and efficiency of the k-means clustering algorithm, Proceedings of the World Congress on Engineering, London, UK, vol. 1, 2009.
  12. Madhu Yedla, S. R. Pathakota, T. M. Srinivasa, Enhancing K-means Clustering Algorithm with Improved Initial Centre, International Journal of Computer Science and Information Technologies, 1 (2) , 2010, pp. 121-125.
Index Terms

Computer Science
Information Sciences


Arithmetic Mean Clustering Cluster Distance Efficiency Partitions