CFP last date
20 May 2024
Reseach Article

Data-Driven Diagnosis of Heart Disease

by Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 41
Year of Publication: 2020
Authors: Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal

Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal . Data-Driven Diagnosis of Heart Disease. International Journal of Computer Applications. 176, 41 ( Jul 2020), 46-54. DOI=10.5120/ijca2020920549

@article{ 10.5120/ijca2020920549,
author = { Md. Istiaq Habib Khan, M. Rubaiyat Hossain Mondal },
title = { Data-Driven Diagnosis of Heart Disease },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2020 },
volume = { 176 },
number = { 41 },
month = { Jul },
year = { 2020 },
issn = { 0975-8887 },
pages = { 46-54 },
numpages = {9},
url = { },
doi = { 10.5120/ijca2020920549 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-07T00:41:03.842305+05:30
%A Md. Istiaq Habib Khan
%A M. Rubaiyat Hossain Mondal
%T Data-Driven Diagnosis of Heart Disease
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 41
%P 46-54
%D 2020
%I Foundation of Computer Science (FCS), NY, USA

This paper focuses on the data-driven diagnosis of heart disease using three freely available datasets. The first dataset has 303 instances with 14 attributes, the second dataset has 462 instances with 10 attributes and the third dataset has 70000 instances with 12 attributes. Scikit-learn library of Python programing language is used for data analysis purpose. Univariate feature selection algorithm is applied in order to find the most valuable attributes and risk factors associated with heart disease. Experimental results show that the most important attribute of the first dataset is the maximum heart rate achieved by a patient, while that of the second and third dataset is the patient age. Next, the heart disease is predicted using several machine learning algorithms including support vector machine (SVM), decision tree, k-nearest neighbors (kNN), logistic regression, naïve Bayes, random forest and majority voting. The training and testing portion of each dataset is separated using holdout and cross-validation methods. The performance of different algorithms for three datasets are evaluated in terms of testing accuracy, precision, recall and F1-score. It is shown here that majority voting as a combination of logistic regression, SVM and naïve Bayes exhibits the best accuracy of 88.89% when applied to the first dataset.

  1. Go, A. S., Mozaffarian, D., Roger, V. L., Benjamin, E. J., Berry, J. D., Blaha, M.J., “Executive summary: heart disease and stroke statistics-2014 update: a report from the American heart association”, Circulation, vol. 129, no. 3, pp. 399-410, Jan. 2014. doi: 10.1161/01.cir.0000442015.53336.12.
  2. Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J. and Scholkopf, B., “Support vector machines”, in IEEE Intelligent systems and their applications, vol. 13, no. 4, pp. 18-28, July-Aug. 1998.
  3. Wang, G., “A survey on training algorithms for support vector machine classifiers”, 2008 international conference on networked computing and advanced information management, pp. 123-128, Gyeongju, 2008.
  4. Laaksonen, J. and Oja, E., “Classification with learning k-nearest neighbors”, Proceedings of International Conference on Neural Networks (ICNN'96), Washington, DC, USA, 1996, pp. 1480-1483 vol.3.
  5. Sanz, J.A., Galar, M., Jurio, A., Brugos, A., Pagola, M. and Bustince, H., “Medical diagnosis of cardiovascular diseases using an interval-valued fuzzy rule-based classification system”, Applied Soft Computing, vol. 20, pp. 103-111, July 2014. doi: 10.1016/j.asoc.2013.11.009.
  6. Setiawan, N.A., “Fuzzy decision support system for coronary artery disease diagnosis based on rough set theory”, International Journal of Rough Sets and Data Analysis, vol. 1, no. 1, pp. 65-80, Jan. 2014. doi: 10.4018/ijrsda.2014010105.
  7. Shouman, M., Turner, T. and Stocker, R., “Using decision tree for diagnosing heart disease patients”, Proceedings of the Ninth Australian Data Mining Conference, Australia, 2011, pp. 23-30.
  8. Marateb, H.R. and Goudarzi, S., “A noninvasive method for coronary artery disease diagnosis using a clinically interpretable fuzzy-rule based system”, Journal of Research in Medical Sciences, vol. 20, no. 3, pp. 214-223, March 2015.
  9. Goni, M. Osman, “Development of a web based expert system for diagnosis of heart disease using fuzzy logic”, M. Engg. Project, Institute of Information and Communication Technology, BUET, 2019.
  10. Latha, C.B.C and Jeeva, S.C., “Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques”, Informatics in Medicine Unlocked, vol. 16, 2019, 100203. doi: 10.1016/j.imu.2019.100203.
  11. Raihan-Al-Masud M, Mondal MRH. Data-driven diagnosis of spinal abnormalities using feature selection and machine learning algorithms. PLOS ONE. 2020; 15(2): e0228422.
  12. Heart disease dataset, UCI machine learning repository, [Last accessed on 31 Mar. 2020].
  13. Cardiovascular disease, [Last accessed on 31 Mar. 2020].
  14. Cardiovascular disease dataset, [Last accessed on 31 Mar. 2020].
  15. Anaconda distribution website, [Last accessed on 12 Feb. 2020].
  16. Saeys Y, Inza I, and Larranaga p. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 19 (2007), 2507–2517.
  17. He J, Hu HJ, Harrison R, Tai PC, & Pan Y (2006). Transmembrane segments prediction and understanding using support vector machine and decision tree. Expert Systems with Applications, 30(1), 64–72.
  18. Witten IH, & Frank E (2005). Data mining: Practical machine learning tools and techniques.
  19. Keerthi SS, & Lin CJ (2003). Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Computation, 15(7), 1667–1689.
  20. Lin HT, & Lin CJ (2003). A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods. Taipei: Department of Computer Science and Information Engineering, National Taiwan University.
  21. Chen M, Hao Y, Hwang K, Wang L, and Wang L, Disease Prediction by Machine Learning Over Big Data From Healthcare Communities, IEEE Access.
  22. Bharati S., Podder P., Mondal M. R. H., and Robel M. R. A., “Threats and Countermeasures of Cyber Security in Direct and Remote Vehicle Communication Systems”, Journal of Information Assurance and Security, MIR Labs, USA, vol. 15 (2020), pp. 153-164, May 2020.
  23. Bharati S., Podder P., and Mondal M. R. H., Diagnosis of Polycystic Ovary Syndrome Using Machine Learning Algorithms. Presented at 2020 IEEE Region 10 Symposium (TENSYMP), 5-7 June 2020, Bangladesh.
  24. Mondal M. R. H., Bharati S., Podder P., Podder P., “Data Analytics for Novel Coronavirus Disease”, Informatics in Medicine Unlocked, Elsevier, Early version available in June 2020.
  25. Khanam F., Nowrin I., and Mondal M. R. H., “Data Visualization and Analyzation of COVID-19”, Journal of Scientific Research and Reports, vol. 26, no. 3, pp. 42-52, Apr. 2020.
  26. Bharati, S., Podder, P., “Adaptive PAPR Reduction Scheme for OFDM Using SLM with the Fusion of Proposed Clipping and Filtering Technique in Order to Diminish PAPR and Signal Distortion". Wireless Personal Communication (2020).
  27. Mondal, M. R. H., and Armstrong, J., "Analysis of the effect of vignetting on MIMO optical wireless systems using spatial OFDM", Journal of Lightwave Technology, IEEE & OSA, vol. 32, no. 5, pp. 922-929, March 2014.
  28. Sarker, N., Islam, M. A., and Mondal, M. R. H., "Two Novel Multiband Centimetre-Wave Patch Antennas for a Novel OFDM Based RFID System", Journal of Communications (JCM), ISSN: 1796-2021, vol. 13, no. 6, Jun. 2018.
Index Terms

Computer Science
Information Sciences


feature selection heart disease SVM logistic regression recall machine learning disease prediction.