Call for Paper - March 2023 Edition
IJCA solicits original research papers for the March 2023 Edition. Last date of manuscript submission is February 20, 2023. Read More

CATCLUS – A Proposed Algorithm for Clustering Categorical Data

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Srikanta Kolay, Kumar S. Ray

Srikanta Kolay and Kumar S Ray. Article: CATCLUS – A Proposed Algorithm for Clustering Categorical Data. International Journal of Computer Applications 139(10):40-44, April 2016. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX

	author = {Srikanta Kolay and Kumar S. Ray},
	title = {Article: CATCLUS – A Proposed Algorithm for Clustering Categorical Data},
	journal = {International Journal of Computer Applications},
	year = {2016},
	volume = {139},
	number = {10},
	pages = {40-44},
	month = {April},
	note = {Published by Foundation of Computer Science (FCS), NY, USA}


Classification of categorical data always involves more complexities compared to the numerical data. Because, a firm outline cannot be drawn in case of categorical data. Different types of assumptions are followed by various researchers to treat such kind of data. Again, dissimilarity measures applied in case of numerical data cannot be applied directly in this case. In this paper, a new clustering algorithm for categorical data is proposed. The algorithm is using a newly devised dissimilarity measure. This paper only includes the theoretical description of the proposed algorithm with appropriate example.


  1. MCQUEEN, J. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297.
  2. Z. Huang Extensions to the k-means algorithm for clustering large data sets with categorical values Data Mining and Knowledge Discovery, 2 (3) (1998), pp. 283–304
  3. S. Guha, R. Rastogi, and K. Shim,” ROCK: A Robust Clustering Algorithm for Categorical Attributes”, 15th International Conference on Data Engineering, pp. 512-521, 2000.
  4. V., Ganti, J. Gehrke, R. Ramakrishnan, CACTUS – clustering categorical data using summaries, in: Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1999, pp. 73–83.
  5. Z. He, X. Xu, S. Deng, Squeezer: an efficient algorithm for clustering categorical data Journal of Computer Science & Technology, 17 (5) (2002), pp. 611–624
  6. D. Kim, K. Lee, D. Lee Fuzzy clustering of categorical data using fuzzy centroids Pattern Recognition Letters, 25 (11) (2004), pp. 1263–1271


Categorical Data, Clustering, Dissimilarity Measure, Algorithm.