Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

Medical Data Classification using Machine Learning Techniques

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2021
Koby Bond, Alaa Sheta

Koby Bond and Alaa Sheta. Medical Data Classification using Machine Learning Techniques. International Journal of Computer Applications 183(6):1-8, June 2021. BibTeX

	author = {Koby Bond and Alaa Sheta},
	title = {Medical Data Classification using Machine Learning Techniques},
	journal = {International Journal of Computer Applications},
	issue_date = {June 2021},
	volume = {183},
	number = {6},
	month = {Jun},
	year = {2021},
	issn = {0975-8887},
	pages = {1-8},
	numpages = {8},
	url = {},
	doi = {10.5120/ijca2021921339},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Medical data classification is a challenging problem in the data mining field. It can be defined as the process of splitting (i.e., categorizing) data into appropriate groups (i.e., classes) based on their common characteristics. The classification of medical data is a significant data mining problem explored in various real-world applications by numerous researchers. In this research, we provide a detailed comparison between several machine learning classification approaches and explored their predictive accuracy on several datasets. They include Support Vector Machine (SVM), Artificial Neural Networks (ANN), and Decision Trees (DT). The quality of the developed classifiers was evaluated using several criteria such as Precision, Recall, and F-Measure. Several data set from the UCI Machine Learning Repository (i.e., Pima Indians Diabetes and the Breast Cancer Coimbra datasets) was used for this study. The experimental results reveal that the ANN-based classifier was the most accurate classification in all cases, with its ROC area being the highest.


  1. B. Tarle, “Medical data classification using different optimization techniques: A survey,” 09 2016.
  2. “The importance of early diabetes detection,” Feb 2017. [Online]. Available: importance-early-diabetes-detection#:~:text= Earlydetectionandtreatmentof,limbamputations, andkidneyfailure.
  3. Feb 2021. [Online]. Available: blog/why-breast-cancer-awareness-so-important
  4. R. Arora and S. Suman, “Comparative analysis of classification algorithms on different datasets using weka,” International Journal of Computer Applications, vol. 54, pp. 21–25, 09 2012.
  5. S. S., D. Murugan, and S. Mayakrishnan, “A study of data classification algorithms j48 and smo on different datasets,” Asian Journal of Research in Social Sciences and Humanities, vol. 6, p. 1276, 01 2016.
  6. R. Chitra and V. Seenivasagam, “Review of heart disease prediction system using data mining and hybrid intelligent techniques,” in SOCO 2013, 2013.
  7. B. Dennis and S. Muthukrishnan, “Agfs: Adaptive genetic fuzzy system for medical data classification,” Appl. Soft Comput., vol. 25, pp. 242–252, 2014.
  8. A. P. G. A.S. Galathiya and C. K. Bhensdadia, “Improved decision tree induction algorithm with feature selection, cross validation, model complexity and reduced error pruning,” 2012.
  9. M. L. Samb, F. Camara, S. N’Diaye, Y. Slimani, M. Esseghir, and C. Anta, “A novel rfe-svm-based feature selection approach for classification,” 2012.
  10. S. Khanmohammadi and M. Rezaeiahari, “Ahp based classification algorithm selection for clinical decision support system development,” in Complex Adaptive Systems, 2014.
  11. M. Patr´ıcio, J. Pereira, J. Cris´ostomo Silva, P. Matafome, M. Gomes, R. Seic¸a, and F. Caramelo, “Using resistin, glucose, age and bmi to predict the presence of breast cancer,” BMC Cancer, vol. 18, 12 2018.
  12. D. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Medical Informatics and Decision Making, vol. 20, 12 2020.
  13. E.Weitschek, G. Felici, and P. Bertolazzi, “Clinical data mining: Problems, pitfalls and solutions,” 2013 24th International Workshop on Database and Expert Systems Applications, pp. 90–94, 2013.
  14. T. Smith and E. Frank, “Introducing machine learning concepts with weka,” Methods in molecular biology (Clifton, N.J.), vol. 1418, pp. 353–378, 03 2016.
  15. G. D. McCann, J. L. Barnes, F. Steele, L. Ridenour, and A.W. Vance, “An evaluation of analog and digital computers,” in Proceedings of the February 4-6, 1953, Western Computer Conference, ser. AIEE-IRE ’53 (Western). New York, NY, USA: Association for Computing Machinery, 1951, p. 19–48.
  16. R. D. D. Veaux, R. D. De, V. Lyle, and H. Ungar, “A brief introduction to neural networks.”
  17. J. Li, J.-h. Cheng, J.-y. Shi, and F. Huang, “Brief introduction of back propagation (bp) neural network algorithm and its improvement,” in Advances in Computer Science and Information Engineering, D. Jin and S. Lin, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 553–558.
  18. N.Chiras, C.Evans, and D.Rees, “Non-linear gas turbine modeling using feedforward neural networks,” Proceedings of ASME TURBO EXPO June 3-6, Amsterdam, The Netherlands GT-30035, University of Glamorgan, publisher of Electronics, Pontypridd, CF37 1DL, Wales, UK, 2002.
  19. E. Frank, M. A. Hall, G. Holmes, R. Kirkby, B. Pfahringer, and I. H. Witten, Weka: A machine learning workbench for data mining. Berlin: Springer, 2005, pp. 1305–1314.
  20. M.Norgaard, O.Ravn, Poulsen, and L.K.Hansen, Neural Networks for Modelling and Control of Dynamic Systems. Springer, London, 2000.
  21. C.Wu.Rebecca, “Neural network models: Foundations and applications to an audit decision problem,” vol. 75, pp. 291– 301, 1997.
  22. V. Vapnik and A. Lerner, “Pattern recognition using generalized portrait method,” Automation and Remote Control, vol. 24, pp. 774–780, 1963.
  23. V. N. Vapnik and A. Y. Chervonenkis, “A class of algorithms for pattern recognition learning,” Avtomat. i Telemekh., vol. 25, no. 6, p. 937–945, 1964.
  24. G. Dai, J. Ge, M. Cai, D. Xu, and W. Li, “Svm-based malware detection for android applications,” in Proceedings of the 8th ACM Conference on Security and Privacy in Wireless and Mobile Networks, ser.WiSec ’15. New York, NY, USA: Association for Computing Machinery, 2015.
  25. A. Sheta, S. Ahmed, and H. Faris, “A comparison between regression, artificial neural networks and support vector machines for predicting stock market index,” International Journal of Advanced Research in Artificial Intelligence, vol. 4, pp. 55–63, 07 2015.
  26. A. Zubiaga, V. Fresno, and R. Mart´ınez, “Is unlabeled data suitable for multiclass svm-based web page classification?” in Proceedings of the NAACL HLT 2009 Workshop on Semi- Supervised Learning for Natural Language Processing, ser. SemiSupLearn ’09. USA: Association for Computational Linguistics, 2009, p. 28–36.
  27. A. Rodan, A. F. Sheta, and H. Faris, “Bidirectional reservoir networks trained using SVM+ privileged information for manufacturing process modeling,” Soft Comput., vol. 21, no. 22, p. 6811–6824, Nov. 2017.
  28. C. Haberfeld, A. F. Sheta, M. S. Hossain, H. Turabieh, and S. Surani, “SAS mobile application for diagnosis of obstructive sleep apnea utilizing machine learning models,” in 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference, UEMCON 2020, New York City, NY, USA, October 28-31, 2020. IEEE, 2020, pp. 522–529.
  29. J. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Tech. Rep. MSR-TR-98-14, April 1998. [Online]. Available: sequential-minimal-optimization-a-fast-algorithm-for-training-support-vector-machines/
  30. D. Boswell, “Introduction to support vector machines,” 2002.
  31. B. E. Boser and et al., “A training algorithm for optimal margin classifiers,” in In Proceedings of the 5 th Annual ACM Workshop on Computational Learning Theory. ACM Press, 1992, pp. 144–152.
  32. C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995.
  33. S. Kiranmai and L. Ahuja, “Data mining for classification of power quality problems using weka and the effect of attributes on classification accuracy,” Protection and Control of Modern Power Systems, vol. 3, 12 2018.
  34. J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, p. 81–106, Mar. 1986.
  35. “Classification: Roc curve and auc—machine learning crash course.” [Online]. Available: machine-learning/crash-course/classification/roc-and-auc
  36. [Online]. Available: php
  37. “Pima indians diabetes database - dataset by datasociety,” Dec 2016. [Online]. Available: data-society/pima-indians-diabetes-database
  38. J. Brownlee, “What is the weka machine learning workbench,” Aug 2020. [Online]. Available: what-is-the-weka-machine-learning-workbench/


Medical Data Classification, Machine Learning, Neural Networks, Support Vector Machines, Decision Trees