CFP last date
22 April 2024
Reseach Article

Evolving Efficient Clustering Patterns in Liver Patient Data through Data Mining Techniques

by Pankaj Saxena, Vineeta Singh, Sushma Lehri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 66 - Number 16
Year of Publication: 2013
Authors: Pankaj Saxena, Vineeta Singh, Sushma Lehri
10.5120/11169-6342

Pankaj Saxena, Vineeta Singh, Sushma Lehri . Evolving Efficient Clustering Patterns in Liver Patient Data through Data Mining Techniques. International Journal of Computer Applications. 66, 16 ( March 2013), 23-28. DOI=10.5120/11169-6342

@article{ 10.5120/11169-6342,
author = { Pankaj Saxena, Vineeta Singh, Sushma Lehri },
title = { Evolving Efficient Clustering Patterns in Liver Patient Data through Data Mining Techniques },
journal = { International Journal of Computer Applications },
issue_date = { March 2013 },
volume = { 66 },
number = { 16 },
month = { March },
year = { 2013 },
issn = { 0975-8887 },
pages = { 23-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume66/number16/11169-6342/ },
doi = { 10.5120/11169-6342 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:22:35.345321+05:30
%A Pankaj Saxena
%A Vineeta Singh
%A Sushma Lehri
%T Evolving Efficient Clustering Patterns in Liver Patient Data through Data Mining Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 66
%N 16
%P 23-28
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is one of the most important research areas in the field of data mining. In simple words, clustering is a division of data into different groups. Data are grouped into clusters in such a way that data of the same group are similar and those in other groups are dissimilar. It aims to minimize intra-class similarity while to maximize interclass dissimilarity. Clustering is an unsupervised learning technique. Clustering is useful to obtain interesting patterns and structures from a large set of data. Clustering can be applied in many areas, such as marketing studies, DNA analysis, city planning, text mining, and web documents classification. Large datasets with many attributes make the task of clustering complex. Many methods have been developed to deal with these problems. In this paper, two well known partitioning based methods – k-means and k-medoids are compared over health data. This paper also proposes an improved k-means medoids clustering algorithm. The proposed algorithm is evaluated using the health dataset i. e Liver dataset and compare the results with other previous algorithms. The proposed algorithm is more effective in terms of computation time as compared to K means and K-medoids clustering algorithm. The algorithms under consideration, is evaluated with Rand Index, Jaccard Coefficient, Folkes and Mallows and Run Time as four metrics. Experimental results are obtained on WEKA, a data mining tool.

References
  1. Bala Suder, V. , Devi, T. and Saravanan N. 2012 "Development of a Data Clustering Algorithm for Predicting Heart" International Journal of Computer Applications" Vol 48, Issue 7, pp 0975-888.
  2. Eisten, M. , Spellman, P. , Brown, P. and Botstein, D. 1998, "Cluster Analysis and Display of Genome-Wide Expression Patterns", in Proc. Natl. Acad. Science USA, Vol. 95, No. 25, pp. 14863 – 14868.
  3. ftp://ftp. ics. uci. edu/pub/machine-learning-databases
  4. Han, J. W. and Kamber, M. , 2001 Data Mining Concepts and Techniques, Higher Education Press, Beijing.
  5. Han, J. , Kamber, M. and Tung, A. 2001. Spatial clustering methods in data mining: A survey. In Miller, H. , and Han, J. , eds. , Geographic Data Mining and Knowledge Discovery. Taylor & Francis.
  6. Hartigan, J. , A. and Wong, M. , A. 1979, " A K-Means Clustering Algorithm", Applied Statistics, Vol. 28, No. 1, pp. 100-108.
  7. Jiang, D. , Tang, C. and Zhang, A. 2004, "Clustering Analysis for Gene Expression Data: A Survey", IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1370–1386.
  8. Kaufman, L. and Rousseeuw, P. J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York .
  9. Li, S, Wu X and Tan M 2008, "Gene Selection Using Hybrid Particle Swarm Optimization and Genetic Algorithm", Soft Computing, Vol. 12, No. 11, pp. 1039–1048.
  10. Lu, Y, Lu, S, Fotoulhi F, Denf Y. and Brown, S. 2004, "Incremental Genetics K-Means Algorithm and Its Applications in Gene Expression Data Analysis", BMC Bioinformatics, Vol. 5, pp. 172– 180.
  11. Park, H-S and Jun, C-H 2009, "A Simple and Fast Algorithm for K-Medoids Clustering", Expert Systems with Applications, Vol. 36, No. 2, pp. 3336 – 3341.
  12. Raghuvira, P. A. , Vani, K. S. and Rao, K. N. 2011. An Efficient Density Based Improved k- medoids Clustering Algorithm. International Journal of Advanced Computer Science & Application. Vol 02, No. 6,49-54.
  13. Ranga Raj, R. Punithavalli . 2012. " Evaluation of Enhanced K-means Algorithm to Student Dataset. International Journal of Advanced Networking & Application". Vol 04,Issue 02, pp 1578-80.
  14. Raymond, T. Ng and Jiawei Han 2002, "CLARANS: A Method for Clustering Obejects for Spatial Data Mining", IEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 5, pp. 1003–1016.
  15. Selim, S. , Z. and Ismail, M. , A. 1984, "K-Means Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality", IEEE Trans. Pattern Anal. Mach. Intel. , Vol. 6, No. 1, pp. 81–87.
  16. Singh, S. , S. and Chauhan,N. C. 2011. " K-means v/s K-medoids: A Comparitive study ". National Conference on recent trends in Engineering & Technology.
Index Terms

Computer Science
Information Sciences

Keywords

Rand index (RI) Jaccard Coefficient Folkes and Mallows (FM) index Silhouette Index