Initializing K-Means Clustering Algorithm using Statistical Information

Mohammad F. Eltibi; Wesam M. Ashour

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

Visual Cryptography Authentication for Locker Systems using Biometric Input

November

2015

Domination in Operations on Intuitionistic Fuzzy Graphs

June

2014

High Speed-Low Power Radix-8 Booth Decoded Multiplier

July

2013

Performance Analysis of Node Mobility in Beacon and Non-Beacon enabled IEEE 802.15.4 based Wireless Sensor Network

August

2013

Reseach Article

Initializing K-Means Clustering Algorithm using Statistical Information

by Mohammad F. Eltibi, Wesam M. Ashour

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 29 - Number 7

Year of Publication: 2011

Authors: Mohammad F. Eltibi, Wesam M. Ashour

10.5120/3573-4930

Mohammad F. Eltibi, Wesam M. Ashour . Initializing K-Means Clustering Algorithm using Statistical Information. International Journal of Computer Applications. 29, 7 ( September 2011), 51-55. DOI=10.5120/3573-4930

@article{ 10.5120/3573-4930,

author = { Mohammad F. Eltibi, Wesam M. Ashour },

title = { Initializing K-Means Clustering Algorithm using Statistical Information },

journal = { International Journal of Computer Applications },

issue_date = { September 2011 },

volume = { 29 },

number = { 7 },

month = { September },

year = { 2011 },

issn = { 0975-8887 },

pages = { 51-55 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume29/number7/3573-4930/ },

doi = { 10.5120/3573-4930 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:15:13.043867+05:30

%A Mohammad F. Eltibi

%A Wesam M. Ashour

%T Initializing K-Means Clustering Algorithm using Statistical Information

%J International Journal of Computer Applications

%@ 0975-8887

%V 29

%N 7

%P 51-55

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

K-means clustering algorithm is one of the best known algorithms used in clustering; nevertheless it has many disadvantages as it may converge to a local optimum, depending on its random initialization of prototypes. We will propose an enhancement to the initialization process of k-means, which depends on using statistical information from the data set to initialize the prototypes. We show that our algorithm gives valid clusters, and that it decreases error and time.

References

G. Gan, C. Ma, J Wu. "Data Clustering Theory, Algorithms, and Applications". American Statistical Association Alexandria, Virginia, 2007.
P. Tan, M. Steinbach, V. Kumar. “Introduction to Data Mining”. Addison-Wesley , 2006.
D. Fisher. “Knowledge acquisition via incremental conceptual clustering”. Machine Learning, 1987, pp. 39–172.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy. "Advances in Knowledge Discovery and Data Mining". AAAI Press, 1996.
A. Gersho, R.M. Gray. "Vector Quantization and Signal Compression". KAP, 1992.
P.S. Bradley, O.L. Mangasarian, W.N. Street. "Clustering via concave minimization". Advances in Neural Information Processing System, MIT Press, vol. 9, 1997, pp. 368–374
J. Aguilar. “Resolution of the Clustering Problem using, Genetic Algorithms”. International Journal of computers, vol. 1, 2007.
R. Vaarandi, “A Data Clustering Algorithm for Mining Patterns from Event Logs”, Proceedings of the 2003 IEEE Workshop on IP Operations and Management. IEEE. 2003.
Q.J. Mac. "Some methods for classification and analysis of multivariate observations". In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1967, pp. 281-297.
R. T Ng, J. Han. “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proceedings of 20th International Conference on Very Large Databases. Santiago de Chile, 1994, pp. 144 – 155.
E. Martin, H. Kriegel, J. Sander, X. Xu. "A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise", Proceedings of second International Conference on Knowledge Discovery and Data Mining, Kluwer Academic Publishers, 1996, pp. 169- 194.
M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander. “OPTICS: Ordering Points to Identify the Clustering Structure”. Proceedings of ACM SIGMOD. Pergamon Press, 1999, pp. 5761 -5767.
A. Hinneburg, H. Gabriel. “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, Proceedings of Knowledge Discovery and Data Mining. AAAI Press, 1998, pp. 58 -65.
R.O. Duda, P.E. Hart. “Pattern Classification and Scene analysis”. John Wiley and Sons, NY. 1973.
K. Arai, A. R. Barakbah. “Hierarchical K-means: an algorithm for centroids initialization for K-means”. Reports of the Faculty of Science and Engineering. Saga University, vol. 36, No.1, 2007, pp. 25-31.
J. F. Lu, J. B. Tang, Z. M. Tang, J.Y. Yang. “Hierarchical initialization approach for K-Means clustering”. Pattern Recognition Letters, vol. 29, April 2008, pp. 787-795.
S. Khan, A. Ahmad. “Cluster center initialization algorithm for K-means clustering”. Pattern Recognition Letters, vol. 25, August 2004, pp. 1293-1302.
F. Caoa, J. Liang , G. Jiang . “An initialization method for the k-Means algorithm using neighborhood model”. Computers & Mathematics with Applications, vol. 58, August 2009, pp. 474-483.
R. M. Dudley. "Uniform Central Limit Theorems". Cambridge University Press, 2008.
I. Myung. "Tutorial on maximum likelihood estimation". Journal of Mathematical Psychology, vol 47, 2003.
UCI Repository [Online]. Available: http://archive.ics.uci.edu.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering K-means Clustering Initial Prototypes Determination Central Limit Theory Normal Distribution Maximum Likelihood Estimator