CFP last date
22 April 2024
Reseach Article

Developing Diabetes Disease Classification Model using Sequential Forward Selection Algorithm

by Emrana Kabir Hashi, Md. Shahid Uz Zaman, Md. Rokibul Hasan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Number 5
Year of Publication: 2017
Authors: Emrana Kabir Hashi, Md. Shahid Uz Zaman, Md. Rokibul Hasan
10.5120/ijca2017916018

Emrana Kabir Hashi, Md. Shahid Uz Zaman, Md. Rokibul Hasan . Developing Diabetes Disease Classification Model using Sequential Forward Selection Algorithm. International Journal of Computer Applications. 180, 5 ( Dec 2017), 1-6. DOI=10.5120/ijca2017916018

@article{ 10.5120/ijca2017916018,
author = { Emrana Kabir Hashi, Md. Shahid Uz Zaman, Md. Rokibul Hasan },
title = { Developing Diabetes Disease Classification Model using Sequential Forward Selection Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2017 },
volume = { 180 },
number = { 5 },
month = { Dec },
year = { 2017 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume180/number5/28793-2017916018/ },
doi = { 10.5120/ijca2017916018 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:59:47.083915+05:30
%A Emrana Kabir Hashi
%A Md. Shahid Uz Zaman
%A Md. Rokibul Hasan
%T Developing Diabetes Disease Classification Model using Sequential Forward Selection Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 180
%N 5
%P 1-6
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data mining techniques are being used extensively in healthcare sector to discover hidden pattern and relationship between patients’ record and their medical diagnosis dataset. In the concept of disease prediction, high classification accuracy can be obtained from accurately pre-processed and trained model. But existence of unimportant and irrelevant attributes in the training dataset may decrease the predictive accuracy and increase the time complexity in training phase. To increase the accuracy and efficiency, feature selection technique is frequently used in data mining. In this paper, a sequential forward selection based wrapper approach is proposed to select optimal and informative feature subset. It is known that diabetes mellitus is the most serious health problem and the complications lead to cause of death. So the aim of this research is to identify the significant attributes and classify diabetes dataset. The proposed approach is used to build the classifier models like Decision tree, K-Nearest Neighbor and Support Vector Machine produces the accuracies of 81.17%, 86.36% and 87.01% respectively. Finally, from results it is clear that the proposed model is performing better with high accuracy comparing the similar existing models. In the research, the Pima Indian diabetes dataset is used.

References
  1. Karthikeyani, V., I. Parvin Begum, K. Tajudin, and I. Shahina Begam. "Comparative of data mining classification algorithm (CDMCA) in diabetes disease prediction." International Journal of Computer Applications 60, no. 12 (2012): 26-31.
  2. Raghavendra, S., and M. Indiramma. "Classification and Prediction Model using Hybrid Technique for Medical Datasets." analysis 127, no. 5 (2015): 20-25.
  3. Vijayan, Veena, and Aswathy Ravikumar. "Study of data mining algorithms for prediction and diagnosis of diabetes mellitus." International journal of computer applications 95, no. 17 (2014): 12-16.
  4. Almarabeh, Hilal, and Ehab F. Amer. "A Study of Data Mining Techniques Accuracy for Healthcare." International Journal of Computers and Applications 168, no. 3 (2017): 12-16.
  5. Parthiban, G., A. Rajesh, and S. K. Srivatsa. "Diagnosing Vulnerability of Diabetic Patients to Heart Diseases using Support Vector Machines." International Journal of Computer Applications48, no. 2 (2012): 45-49.
  6. Vispute, Nilesh Jagdish, Dinesh Kumar Sahu, and Anil Rajput. "An Empirical Comparison by Data Mining Classification Techniques for Diabetes Data Set." International Journal of Computer Applications 131, no. 2 (2015): 6-11.
  7. Parthiban, G., A. Rajesh, and S. K. Srivatsa. "Diagnosis of heart disease for diabetic patients using naive bayes method." International Journal of Computer Applications 24, no. 3 (2011): 7-11.
  8. Tambade, Shital, Madan Somvanshi, Pranjali Chavan, and Swati Shinde. "SVM based Diabetic Classification and Hospital Recommendation." International Journal of Computer Applications167, no. 1 (2017): 40-43.
  9. Karthikeyan, T., and K. Vembandadsamy. "An Analytical Study on Early Diagnosis and Classification of Diabetes Mellitus." International Journal of Computers and Applications 5, no. 5 (2015): 96-104.
  10. Sethi, Harsha. "Diabetes Diagnoser: Expert System for Diagnosis of Diabetes Type-II." International Journal of Computer Applications 148, no. 11 (2016): 19-25.
  11. Sumathy, Mythili, Mythili Thirugnanam, Praveen Kumar, T. M. Jishnujit, and K. Ranjith Kumar. "Diagnosis of Diabetes Mellitus based on Risk Factors." International Journal of Computers and Applications 10, no. 4 (2010): 1-4.
  12. Karthikeyan, T., and K. Vembandadsamy. "An Analytical Study on Early Diagnosis and Classification of Diabetes Mellitus." International Journal of Computers and Applications 5, no. 5 (2015): 96-104.
  13. Asir, D., S. Appavu, and E. Jebamalar. "Literature Review on Feature Selection Methods for High-Dimensional Data." International Journal of Computer Applications 136, no. 1 (2016): 9-17.
  14. Anirudha, R. C., Remya Kannan, and Nagamma Patil. "Genetic algorithm based wrapper feature selection on hybrid prediction model for analysis of high dimensional data." In Industrial and Information Systems (ICIIS), 2014 9th International Conference on, pp. 1-6. IEEE, 2014.
  15. Gandhi, Khyati K., and Nilesh B. Prajapati. "Diabetes prediction using feature selection and classification." International Journal of Advance Engineering and Research Development (2014).
  16. Kaur, Sandeep, and Sheetal Kalra. "Feature Extraction Techniques Using Support Vector Machines In Disease Prediction." In Proceedings of the4th International Conference on Science, Technology and Management (ICSTM-16), India International Centre, New Delhi. 2016.
  17. Negi, Anjli, and Varun Jaiswal. "A first attempt to develop a diabetes prediction method based on different global datasets." In Parallel, Distributed and Grid Computing (PDGC), 2016 Fourth International Conference on, pp. 237-241. IEEE, 2016.
  18. Cho, Baek Hwan, Hwanjo Yu, Kwang-Won Kim, Tae Hyun Kim, In Young Kim, and Sun I. Kim. "Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods." Artificial intelligence in medicine 42, no. 1 (2008): 37-53.
  19. Balakrishnan, Sarojini, and Ramaraj Narayanaswamy. "Feature selection using FCBI in type II diabetes databases." International Journal of the Computer, the Internet and the Management 17, no. 1 (2009): 50-8.
  20. Phan, Anh Viet, Minh Le Nguyen, and Lam Thu Bui. "Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems." Applied Intelligence 46, no. 2 (2017): 455-469.
  21. Huang, Cheng-Lung, and Chieh-Jen Wang. "A GA-based feature selection and parameters optimizationfor support vector machines." Expert Systems with applications 31, no. 2 (2006): 231-240.
  22. Uzer, Mustafa Serter, Nihat Yilmaz, and Onur Inan. "Feature selection method based on artificial bee colony algorithm and support vector machines for medical datasets classification." The Scientific World Journal 2013 (2013).
  23. Huang, Yue, Paul McCullagh, Norman Black, and Roy Harper. "Feature selection and classification model construction on type 2 diabetic patients’ data." Artificial intelligence in medicine 41, no. 3 (2007): 251-262.
  24. Gandhi, Khyati K., and Nilesh B. Prajapati. "Diabetes prediction using feature selection and classification." International Journal of Advance Engineering and Research Development (2014).
  25. Balakrishnan, Sarojini, Ramaraj Narayanaswamy, Nickolas Savarimuthu, and Rita Samikannu. "SVM ranking with backward search for feature selection in type II diabetes databases." In Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on, pp. 2628-2633. IEEE, 2008.
  26. Hashi, Emrana Kabir, Md Shahid Uz Zaman, and Md Rokibul Hasan. "An expert clinical decision support system to predict disease using classification techniques." In Electrical, Computer and Communication Engineering (ECCE), International Conference on, pp. 396-400. IEEE, 2017.
  27. Kohavi, Ron, and George H. John. "Wrappers for feature subset selection." Artificial intelligence 97, no. 1-2 (1997): 273-324.
  28. Karegowda, Asha Gowda, M. A. Jayaram, and A. S. Manjunath. "Feature subset selection problem using wrapper approach in supervised learning." International journal of Computer applications 1, no. 7 (2010): 13-17.
  29. Sun, Ming-an, Qing Zhang, Yejun Wang, Wei Ge, and Dianjing Guo. "Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features." BMC bioinformatics17, no. 1 (2016): 316.
  30. Laimighofer, Michael, Jan Krumsiek, Florian Buettner, and Fabian J. Theis. "Unbiased prediction and feature selection in high-dimensional survival regression." Journal of Computational Biology 23, no. 4 (2016): 279-290.
  31. Mao, Kezhi Z. "Orthogonal forward selection and backward elimination algorithms for feature subset selection." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34, no. 1 (2004): 629-634.
  32. Bagherzadeh-Khiabani, Farideh, Azra Ramezankhani, Fereidoun Azizi, Farzad Hadaegh, Ewout W. Steyerberg, and Davood Khalili. "A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results." Journal of clinical epidemiology 71 (2016): 76-85.
  33. Thirumal, P. C., and N. Nagarajan. "Utilization of data mining techniques for diagnosis of diabetes mellitus-a case study." ARPN Journal of Engineering and Applied Science 10, no. 1 (2015).
  34. Daghistani, Tahani, and Riyad Alshammari. "Diagnosis of Diabetes by Applying Data Mining Classification Techniques." International Journal of Advanced Computer Science and Applications (IJACSA) 7, no. 7 (2016): 329-332.
Index Terms

Computer Science
Information Sciences

Keywords

Classification Feature Selection Wrapper Approach Feature selection SFS Pima Indian Diabetes Dataset C4.5 KNN SVM