CFP last date
22 April 2024
Reseach Article

Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5

by Asha Gowda Karegowda, Punya V, M. A. Jayaram, A. S. Manjunath
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 45 - Number 12
Year of Publication: 2012
Authors: Asha Gowda Karegowda, Punya V, M. A. Jayaram, A. S. Manjunath
10.5120/6836-9460

Asha Gowda Karegowda, Punya V, M. A. Jayaram, A. S. Manjunath . Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5. International Journal of Computer Applications. 45, 12 ( May 2012), 45-50. DOI=10.5120/6836-9460

@article{ 10.5120/6836-9460,
author = { Asha Gowda Karegowda, Punya V, M. A. Jayaram, A. S. Manjunath },
title = { Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5 },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 45 },
number = { 12 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 45-50 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume45/number12/6836-9460/ },
doi = { 10.5120/6836-9460 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:37:28.574103+05:30
%A Asha Gowda Karegowda
%A Punya V
%A M. A. Jayaram
%A A. S. Manjunath
%T Rule based Classification for Diabetic Patients using Cascaded K-Means and Decision Tree C4.5
%J International Journal of Computer Applications
%@ 0975-8887
%V 45
%N 12
%P 45-50
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Medical Data mining is the process of extracting hidden patterns from medical data. This paper presents the development of a hybrid model for classifying Pima Indian diabetic database (PIDD). The model consists of two stages. In the first stage, the K-means clustering is used to identify and eliminate incorrectly classified instances. The continuous data is converted to categorical form by approximate width of the desired intervals, based on the opinion of medical expert. In the second stage a fine tuned classification is done using Decision tree C4. 5 by taking the correctly clustered instance of first stage. Experimental results signify the cascaded K-means clustering and Decision tree C4. 5 has enhanced classification accuracy of C4. 5. Further rules generated using cascaded C4. 5 tree with categorical data are less in numbers and easy to interpret compared to rules generated with C4. 5 alone with continuous data. The proposed cascaded model with categorical data obtained the classification accuracy of 93. 33 % when compared to accuracy of 73. 62 % using C4. 5 alone for PIMA Indian diabetic dataset.

References
  1. J. Han, and M. Kamber, Data Mining: Concepts and Techniques, San Francisco, Morgan Kauffmann Publishers, (2001)
  2. Editorial, Diagnosis and Classification of Diabetes Mellitus, American Diabetes Association, Diabetes Care, vol 27, Supplement 1, (Jan 2004).
  3. The Expert Committee on the Diagnosis and Classification of Diabetes Mellitus: Follow up report on the Diagnosis of Diabetes Mellitus. Diabetic Care 26, pp. 3160- 3167, (2003).
  4. Michie, D. , Spiegelhalter, D. J. , & Taylor, C. C. , Machine learning, neural and statistical classification. Ellis Horwood ,( 1994).
  5. Humar, K. , & Novruz, A. Design of a hybrid system for the diabetes and heart diseases. Expert Systems with Applications, 35, 82–89 ,(2008).
  6. B. M Patil, R. C Joshi, Durga Tosniwal, Hybrid Prediction model for Type-2 Diabetic Patients, Expert System with Applications, 37, 8102-8108 (2010).
  7. Polat, K. , Gunes, S. , & Aslan, A. , A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Systems with Applications, 34(1), 214–221(2008) .
  8. Asha Gowda Karegowda, MA. Jayaram, Integrating Decision Tree and ANN for Categorization of Diabetics Data , International Conference on Computer Aided Engineering, December 13-15, IIT Madras, Chennai, India (2007).
  9. Asha Gowda Karegowda and M. A. Jayaram, Cascading GA & CFS for Feature Subset Selection in Medical Data Mining , International Conference on IEEE International Advance Computing Conference (IACC'09), Thapar University, Patiala, Punjab India (Mar 2009).
  10. Asha Gowda Karegowda, A. S. Manjunath, M. A. Jayaram Application Of Genetic Algorithm Optimized Neural Network Connection Weights For Medical Diagnosis Of Pima Indians Diabetes, International Journal on Soft Computing (IJSC), Vol. 2, No. 2. ( May 2011).
  11. Asha Gowda Karegowda , M. A. Jayaram, A. S. Manjunath ,Cascading K-means Clustering and K-Nearest Neighbor Classifier for Categorization of Diabetic Patients , International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-1, Issue-3, (Feb 2012).
  12. J. R. Quinlan, Induction of Decision Trees, Machine Learning 1: pp. 81-106, Kluwer Academic Publishers, Boston, (1986).
  13. J. R. Quinlan, San Mateo, C4. 5 Programs for Machine Learning: Morgan Kaufmann, (1993).
  14. J. R. Quinlan, Bagging, Boosting and C4. 5, In Proc. 13th National Conf. Artificial Intelligence (AAAI'96), pp. 725-730. Portland, (Aug, 1996).
  15. MacQueen, J. B. , Some Methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley symposium on mathematical statistics and probability (pp. 281–297). Berkeley: University of California Press (1967). .
  16. JJoseph L. Breault, Data Mining Diabetic Databases: Are rough Sets a Useful Addition?, http://www. galaxy. gmu. edu/interface/I01/I2001Proceedings/Jbreault
Index Terms

Computer Science
Information Sciences

Keywords

K-means Clustering Categorical Data Rule Based Classification Decision Tree C4. 5 Pima Indian Diabetics