CFP last date
20 May 2024
Reseach Article

An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets

by Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 29 - Number 5
Year of Publication: 2011
Authors: Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam
10.5120/3564-4903

Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam . An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets. International Journal of Computer Applications. 29, 5 ( September 2011), 1-6. DOI=10.5120/3564-4903

@article{ 10.5120/3564-4903,
author = { Sarojini Balakrishnan, Ramaraj Narayanaswamy, Ilango Paramasivam },
title = { An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets },
journal = { International Journal of Computer Applications },
issue_date = { September 2011 },
volume = { 29 },
number = { 5 },
month = { September },
year = { 2011 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume29/number5/3564-4903/ },
doi = { 10.5120/3564-4903 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:14:57.415661+05:30
%A Sarojini Balakrishnan
%A Ramaraj Narayanaswamy
%A Ilango Paramasivam
%T An Empirical Study on the Performance of Integrated Hybrid Prediction Model on the Medical Datasets
%J International Journal of Computer Applications
%@ 0975-8887
%V 29
%N 5
%P 1-6
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The medical data are multidimensional and hundreds of independent features in these high dimensional databases need to be considered and analyzed, for valuable decision-making information in medical prediction. Most data mining methods depend on a set of features that define the behavior of the learning algorithm and directly or indirectly influence the complexity of the resulting models. Hence, to improve the efficiency and accuracy of mining task on high dimensional data, the data must be preprocessed. Feature selection is a preprocessing step which aims to reduce the dimensionality of the data by selecting the most informative features that influence the diagnosis of the disease. We propose a feature selection embedded Hybrid Prediction model that combines two different functionalities of data mining; the clustering and the classification. The F-score feature selection method and k-means clustering selects the optimal feature subsets of the medical datasets that enhances the performance of the Support Vector Machine classifier. The performance of the SVM classifier is empirically evaluated on the reduced feature subset of Diabetes, Breast Cancer and Heart disease data sets. The proposed model is validated using four parameters namely the Accuracy of the classifier, Area Under ROC Curve, Sensitivity and Specificity. The results prove that the proposed feature selection embedded hybrid prediction model indeed improve the predictive power of the classifier and reduce false positive and false negative rates. The proposed method achieves a predictive accuracy of 98.9427% for diabetes dataset, 99% for cancer dataset and 100% for heart disease dataset, the highest predictive accuracy for these datasets, compared to other models reported in the literature.

References
  1. Dilly Ruth, 2002. Data Mining - An Introduction. Available at http://www.pcc.qub.ac.uk/tec/courses/datamining/stu_notes/dm_book_1.html.
  2. Lavrac N, 1999. Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16(1), 3-23.
  3. Cios K J and Moore G W, 2002. Uniqueness of medical data mining. Artificial Intelligence in Medicine 26(1-2), 1-24.
  4. Wu X, Holmes G and pfahringer B, 2008. Mining arbitrarily large datasets using heuristic k-nearest neighbor search. In Wobcke W and Zhang M, (Eds) Proc. of Twenty-First Australian Joint conference on Artificial Intelligence, Advances in Artificial Intelligence(AI 2008). LNAI 5360. Auckland, NZ: Springer, 355-361.
  5. Paraskevas Orfanidis and David J. Russomanno, 2008. Preprocessing enhancements to improve data mining algorithms. International Journal of Business Intelligence and Data Mining 3(2), 196-211.
  6. D.A. Bell, H. Wang, 2004. A formalism for relevance and its application in feature subset selection, Machine Learning 41, 175–195.
  7. Zhang G P, 2000. Neural Networks for Classification: A Survey. IEEE Trans. on Systems Man, and Cybernetics Part C: Applications and Reviews 30(4), 451–462.
  8. Cao, Bin., Shen, Dou., Sun, Jian-Tao., Yang, Qiang., Chen, Zheng. (2007). Feature selection in a kernel space. In International conference on machine learning (ICML) Oregon, USA, June 20–24, pp. 121–128.
  9. Liu H and Motoda H,1998. Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers.
  10. Pena, J. M., Lozano, J. A., Larranaga, P., & Inza, I., 2001. Dimensionality reduction in unsupervised learning of conditional gaussian networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 590–603.
  11. Yu, L., & Liu, H., 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (pp. 856–863).
  12. Asuncion A and Newman D J, 2007. UCI Machine Learning repository.
  13. http://www.ics.uci.edu/~mlearn/MLRepository.html.University of California, Irvine, CA.
  14. Polat, K., Gunes, S., & Aslan, A., 2008. A cascade learning system for classification of diabetes disease: Generalized discriminant analysis and least square support vector machine. Expert Systems with Applications, 34(1), 214–221.
  15. Patil, B. M., et al., 2010. A Hybrid Prediction Model for Type-2 Diabetic Patient. Expert Systems with Applications, doi:10.1016/j.eswa.2010.05.078.
  16. Polat, K., & Günes, S., 2007. A hybrid approach to medical decision support systems: Combining feature selection, fuzzy weighted pre-processing and AIRS. Computer Methods and Programs in Biomedicine, 88(2), 164–174.
  17. Polat, K., Sahan, S., & Günes, S., 2007. Automatic detection of heart disease using an artificial immune recognition system (AIRS) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing. Expert Systems with Applications, 32(2), 625-631.
  18. Polat, K., Tosun, S., & Günes, S. (2006). Diagnosis of heart disease using artificial immune recognition system and fuzzy weighted preprocessing. Pattern Recognition, 39(11), 2186–2193.
  19. Özs_en, S., & Günes S, 2008. Effect of feature-type in selecting distance measure for an artificial immune system as a pattern recognizer. Digital Signal Processing, 18(4), 635–645.
  20. Kahramanli, H., & Allahverdi, N. (2008). Design of a hybrid system for the diabetes and heart diseases. Expert Systems with Applications, 35(1–2), 82–89.
  21. Polat,k, Güne¸s.S , 2007. Breast cancer diagnosis using least square support vector machine. Digital Signal Processing 17, 694–701.
  22. Setiono, R., 2008. Generating concise and accurate classification rules for breast cancer diagnosis, Artific. Intell. Med. 18, 205–219.
  23. Kurgan, Lukasz A., Cios, Krzysztof J., Tadeusiewicz, Ryszard, Ogiela, Marek, & Doodenday, Lucy S. (2001). Knowledge discovery approach to automated cardiac SPECT diagnosis. Artificial Intelligence in Medicine, 149–169.
  24. Chen Y W and Lin C J, 2005. Combining SVMs with Various Feature Selection Strategies. Available at www.csie.ntu.edu.tw/~cjlin/papers/features.pdf.
  25. Guojun, G., Chaoqu, M., & Jianhong, W., 2007. Data clustering theory algorithm and application (1st ed.). ASA-SIAM.
  26. Witten, H. I., & Frank, E., 2005. Data mining: Practical machine learning tools and techniques (2nd ed.). Morgan Kaufmann Publishers.
  27. Cheng-Lung Huang, Hung-Chang Liao b, Mu-Chen Chen c, 2008. Prediction model building and feature selection with support vector machines in breast cancer diagnosis, Expert Systems with Applications, 578-587 doi:10.1016/j.eswa.2006.09.041
  28. Hsu C W and Lin C J, 2002. A simple decomposition method for support vector machine. Machine Learning 46(1-3), 219–314.
  29. Yang J and Honavar V, 2001. Feature Subset Selection Using A Genetic Algorithm. In Feature Extraction, Construction and Selection: A Data Mining Perspective. 117-136, 1998, second printing.
  30. Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34, 472 113–127.
Index Terms

Computer Science
Information Sciences

Keywords

Medical Data Mining F-score Support Vector Machine Classifier Accuracy Sensitivity Specificity Area Under ROC Curve