CFP last date
22 April 2024
Reseach Article

Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets

by M. Ramakrishnan, D. Tennyson Jyaraj
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 89 - Number 7
Year of Publication: 2014
Authors: M. Ramakrishnan, D. Tennyson Jyaraj
10.5120/15518-4102

M. Ramakrishnan, D. Tennyson Jyaraj . Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets. International Journal of Computer Applications. 89, 7 ( March 2014), 39-42. DOI=10.5120/15518-4102

@article{ 10.5120/15518-4102,
author = { M. Ramakrishnan, D. Tennyson Jyaraj },
title = { Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets },
journal = { International Journal of Computer Applications },
issue_date = { March 2014 },
volume = { 89 },
number = { 7 },
month = { March },
year = { 2014 },
issn = { 0975-8887 },
pages = { 39-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume89/number7/15518-4102/ },
doi = { 10.5120/15518-4102 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:08:39.689602+05:30
%A M. Ramakrishnan
%A D. Tennyson Jyaraj
%T Modified K-Means Algorithm for Effective Clustering of Categorical Data Sets
%J International Journal of Computer Applications
%@ 0975-8887
%V 89
%N 7
%P 39-42
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Traditional k-means algorithm is well known for its clustering ability and efficiency on large amount of data sets. But this method is well suited for numeric values only and cannot be effectively used for categorical data sets. In this paper, we present modified k-means algorithms that can that can perform clustering very effectively on mixed data sets. The main intuition behind our proposed method is that all prototypes are the potential candidates at the root level. For the children of the root node, we can prune the candidate set by using simple geometrical constraints. The experimental results show that this method is well suited for categorical data sets and overall time of computation is very minimal.

References
  1. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An Efficient Data Clustering Method for Very Large Databases. Proc. of the 1996 ACM SIGMOD Int'l Conf. on Management of Data, Montreal, Canada, pages 103–114, June 1996.
  2. Chien, L. J. , Chang, C. C. and Lee, Y. J. , "Variant methods of reduced set selection for reduced support vector machines", Journal of Information Science and Engineering , Vol. 26 (1), 2010.
  3. Chien Cung, Chang, and Yuh-Jye Lee, " Generating the reduced set by systematic sampling", Lecture Notes in Computer Science, Vol. 3177, 2004.
  4. Emre C Oomak , Ahmet Arslan, "A new training method for support vector machines: Clustering k-NN support vector machines", Expert Systems with Applications, Vol. 35, pp. 564–568, 2008.
  5. Gowda, K. C. and Diday, E:, "Symbolic clustering using a new dissimilarity measure", Pattern recognition Letters. Vol. 24 (6), pp. 567-578, 1991.
  6. Hastie, T. , Tibshirani, R. , and Friedman, J. , The Elements of statistical learning, 2nd edition, Springer, 2008
  7. He, Z. , Xu, X. and Deng, S. , "A cluster ensemble for clustering categorical data", Information Fusion, Vol. 6, pp. 143-15, 2005.
  8. Huang, Z. , "Clustering large data sets with mixed numeric and categorical values", Proceedings of The First Pacific Asia Knowledge Discovery and Data Mining Conference , Singapore, 1997.
  9. Huang, Z. , "Extensions to the k-means algorithm for clustering large data sets with categorical values", Data Mining and Knowledge Discovery, Vol. 2, pp. 283-304, 1998.
  10. Huang, Z. , "A note on k-modes clustering", Journal of Classification , Vol. 20, pp. 257-26, 2003.
  11. Huang, C. M. , Lee, Y. J. , Lin, D. K. J. and Huang, S. Y. , "Model selection for support vector machines via uniform design", A Special issue on Machine Learning and Robust Data Mining of Computational Statistics and Data Analysis , Vol. 52, pp. 335-346, 2007.
  12. Hsu, C. W. , C. C. Chang and C. J. Lin, "Practical guide to support vector classification", Department of Computer Science and Information Engineering National Taiwan University,2003).
  13. M. Emre Celebi, Hassan A. Kingravi, Patricio A. Vela, "A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm", Journal of Expert Systems with Applications, 40 (2013) 200-210, September 2012
  14. Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, J&g Sander, "OPTICS: Ordering Points To Identify the Clustering Structure", SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data , Volume 28 Issue 2, June 1999 Pages 49-60
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Large Data Sets K-Means algorithm CLARANS DBSCAN Data Mining Pattern Mining Rule Mining