Clustering Mixed Data Set by Fuzzy Set Partitioning

Nipjyoti Sarma; Arindam Saha; Adarsh Pradhan

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Clustering Mixed Data Set by Fuzzy Set Partitioning

by Nipjyoti Sarma, Arindam Saha, Adarsh Pradhan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 144 - Number 6

Year of Publication: 2016

Authors: Nipjyoti Sarma, Arindam Saha, Adarsh Pradhan

10.5120/ijca2016910305

Nipjyoti Sarma, Arindam Saha, Adarsh Pradhan . Clustering Mixed Data Set by Fuzzy Set Partitioning. International Journal of Computer Applications. 144, 6 ( Jun 2016), 8-12. DOI=10.5120/ijca2016910305

@article{ 10.5120/ijca2016910305,

author = { Nipjyoti Sarma, Arindam Saha, Adarsh Pradhan },

title = { Clustering Mixed Data Set by Fuzzy Set Partitioning },

journal = { International Journal of Computer Applications },

issue_date = { Jun 2016 },

volume = { 144 },

number = { 6 },

month = { Jun },

year = { 2016 },

issn = { 0975-8887 },

pages = { 8-12 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume144/number6/25181-2016910305/ },

doi = { 10.5120/ijca2016910305 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:46:53.139425+05:30

%A Nipjyoti Sarma

%A Arindam Saha

%A Adarsh Pradhan

%T Clustering Mixed Data Set by Fuzzy Set Partitioning

%J International Journal of Computer Applications

%@ 0975-8887

%V 144

%N 6

%P 8-12

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

K mean clustering is a very popular clustering algorithm for clustering numerical data. . It is popular due to its simplicity of understanding and linear algorithmic complexity measure. But it has the serious limitation of clustering numerical only data. Therefore several researchers tried to improve the k mean algorithm to cluster not only numerical but also categorical dataset. In this work an effort have been made to put forward a proposed FCV mean algorithm which is a modified version of the traditional k-mean algorithm and is able to cluster objects having mixed type attributes i.e. numerical and categorical. For categorical data fuzzy set similarity is used and for numerical data differences from maximum dissimilarity is used. Experiment shows that the mixed data are highly clustered with high accuracy compared to other approach in literature.

References

Yiu Ming Cheung and Hong Jia in “Categorical and numerical data clustering based on a unified similarity metric with out knowing cluster number” in Pattern Recognition 46 (2013) 2228–2238, Elsevier (2013).
R. A. Ahmed B. Borah D. K. Bhattacharyya and J K Kalita in HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm” in”( 2005) in http://citeseerx.isi.psu.edu/viewdoc/download?doi=10.1.1.61.6369 & rep=rep1 & type=pdf,
LIU Hai-tao, WEI Ru-xiang and JIANG Guo-ping in Similarity measurement for data with high-dimensional and mixed feature values through fuzzy clustering” in Proceedings of IEEE conference,2012 ,pp-617-621,
Z. Huang in “Clustering large data sets with mixed numeric and categorical values” in Proceedings of the First Pacific-Asia Conference on Knowledge Discovery and Data Mining, , pp. 21–34.1997,
Limin CHEN , Jing YANG and Jianpei ZHANG, in “An Efficient Clustering Method for Large Mixed Type Dataset” in Journal of Computational Information Systems 8: 22 (2012) 9553–9560,
Ming-Yi Shih, Jar-Wen Jheng and Lien-Fu Lai in “A Two Step Method for Clustering Mixed Categorical and Numerical data” in Tamkng Journal of Science and Engineering, Vol. 13, No. 1, pp. 11-19(2010)
Jongwoo Lim, Jongeun Jun, Seon Ho Kim and Dennis McLeod in “A Framework for Clustering Mixed Attribute Type Datasets” in Proceedings of the fourth International Conference on Emerging Databases (EBD),2102.
Chian Hsu and Yan-Ping Huang in “Incremental clustering of mixed data based on distance hierarchy” in Elsevier/Expert Systems with Applications 35, 1177–1185,2008,
Fuzzy logic and nural networks by M. Amrithavalli first, second and third chapter Fourth reprint)published by scitech publications(India )pvt. Limited, 2010.
UCI machine learning Repository- http://archive.ics.uci.edu/ml/

Index Terms

Computer Science

Information Sciences

Keywords

fuzzy set Centroid vector dissimilarity categorical numerical.