Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring

K.Arunprabha; V.Bhuvaneswari

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2025

Submit your paper

Know more

The week's pick

Disease Detection in Tea Leaves: A Hybrid Model Using YOLOv7 and DCNN

Md Zahidul Kabir Md Sourav Hossen Sumiya Kaisar Keya

Random Articles

An Approach to Increase the Effectiveness of Electronic Devices by Reducing Thermal Generation of Current

February

2016

Low Cost Microwave Plasma Generation System - A Power Analysis study

September

2011

Network Intrusion Detection using Selected Data Mining Approaches: A Review

December

2015

Brain Tumor Epilepsy Seizure Identification using Multi-Wavelet Transform, Neural Network and Clinical Diagnosis Data

April

2013

Reseach Article

Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring

by K.Arunprabha, V.Bhuvaneswari

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 11 - Number 3

Year of Publication: 2010

Authors: K.Arunprabha, V.Bhuvaneswari

10.5120/1565-1875

K.Arunprabha, V.Bhuvaneswari . Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring. International Journal of Computer Applications. 11, 3 ( December 2010), 4-7. DOI=10.5120/1565-1875

@article{ 10.5120/1565-1875,

author = { K.Arunprabha, V.Bhuvaneswari },

title = { Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring },

journal = { International Journal of Computer Applications },

issue_date = { December 2010 },

volume = { 11 },

number = { 3 },

month = { December },

year = { 2010 },

issn = { 0975-8887 },

pages = { 4-7 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume11/number3/1565-1875/ },

doi = { 10.5120/1565-1875 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:59:38.580263+05:30

%A K.Arunprabha

%A V.Bhuvaneswari

%T Article:Comparing K-Value Estimation for Categorical and Numeric Data Clustring

%J International Journal of Computer Applications

%@ 0975-8887

%V 11

%N 3

%P 4-7

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In Data mining, Clustering is one of the major tasks and aims at grouping the data objects into meaningful classes (clusters) such that the similarity of objects within clusters is maximized, and the similarity of objects from different clusters is minimized. When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. We present an improved algorithm for learning k while clustering the Categorical clustering. We present a clustering algorithm Gaussian means applied in k-means paradigm that works well for categorical features. For applying Categorical dataset to this algorithm, converting it into numeric dataset. In this paper we present a Heuristic novel techniques are used for conversion and comparing the categorical data with numeric data. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs in k-means with increasing k in a hierarchical fashion until the test accepts the hypothesis that the data assigned to each k-means center are Gaussian. G-means only requires one intuitive parameter, the standard statistical significance level α.

References

“Anderson-Darling: A Goodness of Fit Test for Small Samples Assumptions”,START,Vol .10,No.5.
Ahmed M. Sultan Hala Mahmoud Khaleel., ”A new modified Goodness of fit tests for type 2 censored sample from Normal population“
Blake. C.L. and Merz. C.J. “ UCI repository of machine learning databases”,1998.
Chris Ding, Xiaofeng He, Hongyuan Zha, and Horst Simon. “Adaptive dimension reduction for clustering high dimensional data”.In Proceedings of the 2nd IEEE International Conference on Data Mining, 2002.
Dongmin Cai, and Stephen S-T Yau, ”Categorical Clustering By Converting Associated Information” International Journal of Computer Science 1;1 2006.
Greg Hamerly,Charles Elkan, “Learning the k in k means”
Gregory James Hamerly,”Learning structure and concepts in data through data clustering”. 2001.
Jain,A.K., Murty. M. N., and Flynn. P. J. “Data clustering: a review”. ACM Computing Surveys, 1999.
Stephens. M.A. “EDF statistics for goodness of fit and some comparisons”. American Statistical Association, September 1974.
Zhang. Y. , Fu. A, Cai. C. and Heng. P., “Clustering categorical data” 2000
Zhexue Huang, ”Extensions to the K-means algorithm for clustering Large Data sets with categorical value”, 1998.

Index Terms

Computer Science

Information Sciences

Keywords

Data mining Clustering Algorithm Categorical data Gaussian Distribution