CFP last date
20 May 2024
Reseach Article

Ensemble Fuzzy Clustering for Mixed Numeric and Categorical Data

by J. Suguna, M. Arul Selvi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 42 - Number 3
Year of Publication: 2012
Authors: J. Suguna, M. Arul Selvi
10.5120/5673-7705

J. Suguna, M. Arul Selvi . Ensemble Fuzzy Clustering for Mixed Numeric and Categorical Data. International Journal of Computer Applications. 42, 3 ( March 2012), 19-23. DOI=10.5120/5673-7705

@article{ 10.5120/5673-7705,
author = { J. Suguna, M. Arul Selvi },
title = { Ensemble Fuzzy Clustering for Mixed Numeric and Categorical Data },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 42 },
number = { 3 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 19-23 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume42/number3/5673-7705/ },
doi = { 10.5120/5673-7705 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:30:32.308671+05:30
%A J. Suguna
%A M. Arul Selvi
%T Ensemble Fuzzy Clustering for Mixed Numeric and Categorical Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 42
%N 3
%P 19-23
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In data mining, clustering is one of the major tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects between clusters is minimized. The dataset sometimes may be in mixed nature that is it may consist of both numeric and categorical type of data. Naturally these two types of data may differ in their characteristics. Due to the differences in their characteristics, in order to group these types of mixed data, it is better to use the ensemble clustering method which uses split and merge approach to solve this problem. In this paper, the original mixed dataset is splitted into numeric dataset and categorical dataset and clustered using both traditional clustering algorithms (K-Means and K-Modes) and fuzzy clustering algorithms (Fuzzy C-Means and Fuzzy C-Modes). The resultant clusters are combined using ensemble clustering methods and evaluated by both f-measure and entropy measure. It is found that splitting is more beneficial and applying fuzzy clustering algorithms yields better results than traditional clustering algorithms.

References
  1. Jain. A. K, Murty. M. N, and Flynn. P. J, "Data clustering: a review", ACM Computing Surveys, 1999.
  2. Kotsiantis. S, Pintelas. P, "Recent Advances in Clustering: A Brief Survey", WSEAS Transactions on Information Science and Applications, Vol. 1, No. 1 (73-81), 2004.
  3. Zengyou He, Xiaofei Xu, and Shengchun Deng "A Link Clustering based approch for Clustering Categorical Data", Department of Computer Science and Engineering, Harbin Institute of Technology, China
  4. Jagannatha Reddy. M. V and Dr. Kavitha. B, "Efficient Ensemble Algorithm for Mixed Numeric and Categorical Data", IEEE International Conference on Computational Intelligence and Computing Research, Dec, 2010.
  5. Roberto Avogadri and Giorgio Valentini, "Ensemble clustering with a fuzzy approach", Department of Science and Information (DSI), University of Milan, Italy.
  6. Velmurugan. T and Santhanam. T, "Clustering Mixed Data Points using Fuzzy C-Means Clustering Algorithm for Performance Analysis", International Journal on Computer Science and Engineering Vol. 2, No. 9, 2010.
  7. Aranganayagi. S and Thangavel. K, "Extended K-Modes with Probability Measure", International Journal of Computer Theory and Engineering, Vol. 2, No. 3, June, 2010.
  8. Aranganayagi. S and Thangavel. K, "Improved K-Modes for Categorical Clustering using Weighted Dissimilarity Measure", International Journal of Information and Mathematical Sciences, Vol. 2, No. 5, 2009.
  9. Michael K. Ng and Liping Jing, "A New Fuzzy K-Modes Clustering Algorithm for Categorical Data", International Journal of Granular Computing, Rough Sets and Intelligent Systems, Vol. 1, No. 1, 2009.
  10. Zhexue Huang and Michael K. Ng, "A Fuzzy K-Modes Algorithm for Clustering Categorical Data", IEEE Transactions on Fuzzy Systems, Vol. 7, No. 4, August 1999.
  11. Dae-Won Kim, Kwang H. Lee, and Doheon Lee, "Fuzzy clustering of categorical data using fuzzy centroids", Pattern Recognition Letters 25 (1263–1271), 2004.
  12. L. Jegatha Deborah, R. Baskaran, A. Kannan, "A Survey on Internal Validity Measure for Cluster Validation", International Journal of Computer Science & Engineering Survey (IJCSES) Volume 1, Issue No. 2, November 2010.
  13. Erendira Rendon, Itzel Abundez, Alejandra Arizmendi and Elvia M. Quiroz, "Internal versus External cluster validation Indexes", International Journal of Computers and Communications, Issue 1, Volume 5, 2011.
  14. Satya Chaitanya Sripada and Dr. Sreenivasa M. Rao, "Comparison of Purity and Entropy of K-Means Clustering and Fuzzy C Means Clustering", Indian Journal of Computer Science and Engineering (IJCSE), Vol. 2, No. 3, June, 2011.
  15. Revati Raman Dewangan , Lokesh Kumar Sharma and Ajaya Kumar Akasapu, "Fuzzy Clustering Technique for Numerical and Categorical dataset", International Journal on Computer Science and Engineering , NCICT 2010, Special Issue.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Ensemble Clustering Mixed Dataset Numeric Dataset Categorical Dataset