Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 89 - Number 7
Year of Publication: 2014
M. Ramakrishnan
D. Tennyson Jyaraj

M Ramakrishnan and Tennyson D Jyaraj. Article: Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets. International Journal of Computer Applications 89(7):39-42, March 2014. Full text available. BibTeX

	author = {M. Ramakrishnan and D. Tennyson Jyaraj},
	title = {Article: Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {89},
	number = {7},
	pages = {39-42},
	month = {March},
	note = {Full text available}


Traditional k-means algorithm is well known for its clustering ability and efficiency on large amount of data sets. But this method is well suited for numeric values only and cannot be effectively used for categorical data sets. In this paper, we present modified k-means algorithms that can that can perform clustering very effectively on mixed data sets. The main intuition behind our proposed method is that all prototypes are the potential candidates at the root level. For the children of the root node, we can prune the candidate set by using simple geometrical constraints. The experimental results show that this method is well suited for categorical data sets and overall time of computation is very minimal.


  • T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proc. of the 1996 ACM SIGMOD Int'l Conf. on Management of Data, Montreal, Canada, pages 103–114, June 1996.
  • Chien, L. J. , Chang, C. C. and Lee, Y. J. , "Variant methods of reduced set selection for reduced support vector machines", Journal of Information Science and Engineering , Vol. 26 (1), 2010.
  • Chien Cung, Chang, and Yuh-Jye Lee, " Generating the reduced set by systematic sampling", Lecture Notes in Computer Science, Vol. 3177, 2004.
  • Emre C Oomak , Ahmet Arslan, "A new training method for support vector machines: Clustering k-NN support vector machines", Expert Systems with Applications, Vol. 35, pp. 564–568, 2008.
  • Gowda, K. C. and Diday, E:, "Symbolic clustering using a new dissimilarity measure", Pattern recognition Letters. Vol. 24 (6), pp. 567-578, 1991.
  • Hastie, T. , Tibshirani, R. , and Friedman, J. , The Elements of statistical learning, 2nd edition, Springer, 2008
  • He, Z. , Xu, X. and Deng, S. , "A cluster ensemble for clustering categorical data", Information Fusion, Vol. 6, pp. 143-15, 2005.
  • Huang, Z. , "Clustering large data sets with mixed numeric and categorical values", Proceedings of The First Pacific Asia Knowledge Discovery and Data Mining Conference , Singapore, 1997.
  • Huang, Z. , "Extensions to the k-means algorithm for clustering large data sets with categorical values", Data Mining and Knowledge Discovery, Vol. 2, pp. 283-304, 1998.
  • Huang, Z. , "A note on k-modes clustering", Journal of Classification , Vol. 20, pp. 257-26, 2003.
  • Huang, C. M. , Lee, Y. J. , Lin, D. K. J. and Huang, S. Y. , "Model selection for support vector machines via uniform design", A Special issue on Machine Learning and Robust Data Mining of Computational Statistics and Data Analysis , Vol. 52, pp. 335-346, 2007.
  • Hsu, C. W. , C. C. Chang and C. J. Lin, "Practical guide to support vector classification", Department of Computer Science and Information Engineering National Taiwan University,2003).
  • M. Emre Celebi, Hassan A. Kingravi, Patricio A. Vela, "A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm", Journal of Expert Systems with Applications, 40 (2013) 200-210, September 2012
  • Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, J&g Sander, "OPTICS: Ordering Points To Identify the Clustering Structure", SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data , Volume 28 Issue 2, June 1999 Pages 49-60