An Empirical Comparison by Data Mining Classification Techniques for Diabetes Data Set

Nilesh Jagdish Vispute; Dinesh Kumar Sahu; Anil Rajput

Call for Paper

December Edition

IJCA solicits high quality original research papers for the upcoming December edition of the journal. The last date of research paper submission is 20 November 2025

Submit your paper

Know more

The week's pick

A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification

Raihan Tanvir

Random Articles

Reseach Article

An Empirical Comparison by Data Mining Classification Techniques for Diabetes Data Set

by Nilesh Jagdish Vispute, Dinesh Kumar Sahu, Anil Rajput

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 131 - Number 2

Year of Publication: 2015

Authors: Nilesh Jagdish Vispute, Dinesh Kumar Sahu, Anil Rajput

10.5120/ijca2015907238

Nilesh Jagdish Vispute, Dinesh Kumar Sahu, Anil Rajput . An Empirical Comparison by Data Mining Classification Techniques for Diabetes Data Set. International Journal of Computer Applications. 131, 2 ( December 2015), 6-11. DOI=10.5120/ijca2015907238

@article{ 10.5120/ijca2015907238,

author = { Nilesh Jagdish Vispute, Dinesh Kumar Sahu, Anil Rajput },

title = { An Empirical Comparison by Data Mining Classification Techniques for Diabetes Data Set },

journal = { International Journal of Computer Applications },

issue_date = { December 2015 },

volume = { 131 },

number = { 2 },

month = { December },

year = { 2015 },

issn = { 0975-8887 },

pages = { 6-11 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume131/number2/23419-2015907238/ },

doi = { 10.5120/ijca2015907238 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:26:10.200331+05:30

%A Nilesh Jagdish Vispute

%A Dinesh Kumar Sahu

%A Anil Rajput

%T An Empirical Comparison by Data Mining Classification Techniques for Diabetes Data Set

%J International Journal of Computer Applications

%@ 0975-8887

%V 131

%N 2

%P 6-11

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Data mining is a process of extracting information from a dataset and transform it into understandable structure for further use, also it discovers patterns in large data sets . Data mining has number of important techniques such as preprocessing, classification. Classification is one such technique which is based on supervised learning.. diabetic is a life threatening disease prevalent in several developed as well as developing countries like India. the data classification is diabetic patients data set is developed by collecting data from hospital repository consists of 1865 instances with different attributes. The instances in the dataset are two categories of blood tests, urine tests. In this paper we discuss various algorithm approaches of data mining that have been utilized for diabetic disease prediction. Data mining is a well known technique used by health organizations for classification of diseases such as diabetes and cancer in bioinformatics research. In the proposed approach we have used WEKA with 10 cross validation to evaluate data and compare results. Weka has an extensive collection of different machine learning and data mining algorithms. In this paper we have firstly classified the diabetic data set and then compared the different data mining techniques in weka through Explorer, knowledge flow and Experimenter interfaces. Furthermore in order to validate our approach we have used a diabetic dataset with 108 instances but weka used 99 rows and 18 attributes to determine the prediction of disease and their accuracy using classifications of different algorithms to find out the best performance. The main objective of this paper is to classify data and assist the users in extracting useful information from data and easily identify a suitable algorithm for accurate predictive model from it. From the findings of this paper it can be concluded that Naïve Bayes the best performance algorithms for classified accuracy because they achieved maximum accuracy= 76.3021% correctly classified instances, maximum ROC = 0.819 , had least mean absolute error and it took minimum time for building this model through Explorer and Knowledge flow results.

References

S , Liver Disease Prediction Using Bayesian Classification , Special Issues , 4th National Conference on Advance Computing , Application Technologies, May 2014
SolankiA.V., Data Mining Techniques using WEKA Classification for Sickle Cell Disease, International Journal of Computer Science and Information Technology,5(4): 5857-5860,2014.
Joshi J, Rinal D, Patel J, Diagnosis And Prognosis of Breast Cancer Using Classification Rules, International Journal of Engineering Research and General Science,2(6):315-323, October 2014.
David S. K., Saeb A. T., Al Rubeaan K., Comparative Analysis of Data Mining Tools and Classification Techniques using WEKA in Medical Bioinformatics, Computer Engineering and Intelligent Systems, 4(13):28-38,2013.
Vijayarani, S., Sudha, S., Comparative Analysis of Classification Function Techniques for Heart Disease Prediction, International Journal of Innovative Research in Computer and Communication Engineering, 1(3): 735-741, 2013.
Kumar M. N., Alternating Decision trees for early diagnosis of dengue fever .arXiv preprint arXiv:1305.7331,2013.
Durairaj M, Ranjani V, Data mining applications in healthcare sector a study. Int. J. Sci. Technol. Res. IJSTR, 2(10), 2013.
Sugandhi C , Ysodha P , Kannan M , Analysis of a Population of Cataract Patient Database in WEKA Tool , International Journal of Scientific and Engineering Research ,2(10) ,October ,2011.
Yasodha P, Kannan M, Analysis of Population of Diabetic Patient Database in WEKA Tool, International Journal of Science and Engineering Research, 2 (5), May 2011.
Bin Othman M. F , Yau, T. M. S., Comparison of different classification techniques using WEKA for breast cancer, In 3rd Kuala Lumpur International Conference on Biomedical Engineering 2006, Springer Berlin Heidelberg, 520-523,January 2007.
Wikipedia, http://en.m.wikipedia.org/wiki/Dengue_fever, accessed in January 2015.
Wikipedia,http://en.m.wikipedia.org/wiki/weka (machine learning), accessed in January 2015.
Waikato, http://www.cs.waikato.ac.nz/ml/weka,accessed in January 2015.
Wikipedia,en.m.wikipedia.org/wiki/Data_set,accessed in January 2015.
KirkbyR, Frank E, WEKA Explorer User Guide for version 3-4-3, November2004.
J. Han and M. Kamber, “Data Mining: Concepts and Techniques”, Morgan Kaufmann, 2000.
Varun Kumar and Nisha Rathee,” Knowledge discovery from database Using an integration of clustering and classification”, (IJACSA) International Journal of Advanced Computer Science and Applications, 2011.
Swasti Singhal, Monika Jena, “A Study on WEKA Tool for Data Preprocessing, Classification and Clustering”, International Journal of Innovative Technology and Exploring Engineering(IJITEE), 2013
Arodz,M.Kurdziel, E. O. D. Sevre, and D.A.Yuen, “Pattern recognition techniques for automatic detection of suspicious-looking anomalies in mammograms,” Comput. Methods Programs Biomed., vol. 79, pp. 135–149, 2005.
L. Ramirez, N. G. Durdle, V. J. Raso, and D. L. Hill, “A support vector machines classifier to assess the severity of idiopathic scoliosis from surface topology,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 1, pp. 84–91, Jan. 2006.
A. Swets, R. M. Dawes, and J. Monahan. “Better decisions through science”, Scientific American, 283:82– 87, October 2000.
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press, Cambridge, 1988.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
I. H. Witten and E. Frank. Data Mining - Pracitcal Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann Publishers, 2000.
Chen, Y.-W., & Lin, C.-J. (2005). Combining SVMs with various feature selection strategies. Available from http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf.
Cheng-Lung Huang, Hung-Chang Liao b, Mu-Chen Chen c, “Prediction model building and feature selection with support vector machines in breast cancer diagnosis “, Expert Systems with Applications”, 2008, 578-587 doi:10.1016/j.eswa.2006.09.041

Index Terms

Computer Science

Information Sciences

Keywords

Weka Data mining Classification Diabetic Disease Prediction.