CFP last date
20 May 2024
Reseach Article

Data Mining Approach to Analyze COVID-19 Dataset of Mexican Patients

by Waheeda Almayyan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 174 - Number 29
Year of Publication: 2021
Authors: Waheeda Almayyan
10.5120/ijca2021921217

Waheeda Almayyan . Data Mining Approach to Analyze COVID-19 Dataset of Mexican Patients. International Journal of Computer Applications. 174, 29 ( Apr 2021), 30-40. DOI=10.5120/ijca2021921217

@article{ 10.5120/ijca2021921217,
author = { Waheeda Almayyan },
title = { Data Mining Approach to Analyze COVID-19 Dataset of Mexican Patients },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2021 },
volume = { 174 },
number = { 29 },
month = { Apr },
year = { 2021 },
issn = { 0975-8887 },
pages = { 30-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume174/number29/31863-2021921217/ },
doi = { 10.5120/ijca2021921217 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:23:26.798027+05:30
%A Waheeda Almayyan
%T Data Mining Approach to Analyze COVID-19 Dataset of Mexican Patients
%J International Journal of Computer Applications
%@ 0975-8887
%V 174
%N 29
%P 30-40
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The pandemic originated by coronavirus (COVID-19), force governments to choosing different health policies to stop the infection and inspire many research groups to work on patient’s data to understand the virus behaviour. This research suggests a two-phase prediction system with several learning algorithms to explore the COVID-19 dataset, where Chi-square is employed at the first stage. Cuckoo search and Grey Wolf Optimiser approaches have been proposed in the second stage to inherit their advantages to select the most distinctive features. The proposed classification model is trained and tested with six machine learning algorithms. The proposed model resulted in 96.5% of Accuracy with samples of 95839 patients with several incomplete data.

References
  1. Centers for Disease Control and Prevention. Coronavirus Disease 2019 (COVID-19) Symptoms; U.S. Department of Health & Human Services: Atlanta, GA, USA, 2020. Available online: https://www.cdc.gov/coronavirus/ 2019-ncov/symptoms-testing/symptoms.html (accessed on 1 March 2021).
  2. Méndez-Arriaga, F. (2020). The temperature and regional climate effects on communitarian COVID-19 contagion in Mexico throughout phase 1. Science of the Total Environment, 735, 139560.
  3. en México, R. H. D. N. (2003). Dirección General de Epidemiología, Secretaría de Salud. México.
  4. Gupta, M. K., & Chandra, P. (2020). A comprehensive survey of data mining. International Journal of Information Technology, 1-15.
  5. Ratner, B. (2017). Statistical and Machine-Learning Data Mining:: Techniques for Better Predictive Modeling and Analysis of Big Data. CRC Press.
  6. Srinivas, K., Rani, B. K., & Govrdhan, A. (2010). Applications of data mining techniques in healthcare and prediction of heart attacks. International Journal on Computer Science and Engineering (IJCSE), 2(02), 250-255.
  7. Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260.
  8. Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., Rouf, N., & Din, M. M. U. (2020). Machine learning based approaches for detecting COVID-19 using clinical text data. International Journal of Information Technology, 12(3), 731-739.
  9. Wang L, Wong A (2020) COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 Cases from chest radiography images. https://arxiv.org/abs/2003.09871
  10. Roda, W.C.; Varughese, M.B.; Han, D.; Li, M.Y. Why is it difficult to accurately predict the COVID-19 epidemic? Infect. Dis. Model. 2020, 5, 271–281.
  11. Roosa, K.; Lee, Y.; Luo, R.; Kirpich, A.; Rothenberg, R.; Hyman, J.M.; Yan, P.; Chowell, G. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020. Infect. Dis. Model. 2020, 5, 256–263.
  12. Wang, H.; Wang, Z.; Dong, Y.; Chang, R.; Xu, C.; Yu, X.; Zhang, S.; Tsamlag, L.; Shang, M.; Huang, J.; et al. Phase-adjusted estimation of the number of Coronavirus Disease 2019 cases in Wuhan, China. Cell Discov. 2020.
  13. Narin A, Kaya C, Pamuk Z. Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks. arXiv preprint arXiv:2003.10849. 2020 Mar 24.
  14. Barstugan, M., Ozkaya, U., & Ozturk, S. (2020). Coronavirus (covid-19) classification using ct images by machine learning methods. arXiv preprint arXiv:2003.09424.
  15. Wiguna W, Riana D. Diagnosis of Coronavirus disease 2019 (Covid-19) surveillance using C4. 5 Algorithm. Jurnal Pilar Nusa Mandiri. 2020;16(1):71-80.
  16. Muhammad LJ, Islam MM, Sharif US, Ayon SI. Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients Recovery.
  17. Yan L, Zhang H-T, Xiao Y, Wang M, Sun C, Liang J, Li S, Zhang M, Guo Y, Xiao Y, Tang X, Cao H, Tan X, Huang N, Amd A, Luo BJ, Cao Z, Xu H, Yuan Y (2020) Prediction of criticality in patients with severe covid-19 Infection using three clinical features: a machine learning-based prognostic model with clinical data in Wuhan. medRxiv. https://doi.org/10.1101/2020.02.27. 20028027
  18. Jiang X, Coffee M, Bari A, Wang J, Jiang X, Huang J, Shi J, Dai J, Cai J, Zhang T, Wu Z, He G, Huang Y (2020) Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity. Compu Mater Contin 63(1):537–551.
  19. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge , (accessed on 1 March 2021).
  20. Gao, L., Song, J., Liu, X., Shao, J., Liu, J., & Shao, J. (2017). Learning in high-dimensional multimedia data: the state of the art. Multimedia Systems, 23(3), 303-313.
  21. Sorzano, C. O. S., Vargas, J., & Montano, A. P. (2014). A survey of dimensionality reduction techniques. arXiv preprint arXiv:1403.2877.
  22. Chauhan, D., & Mathews, R. (2019, December). Review on Dimensionality Reduction Techniques. In International conference on Computer Networks, Big data and IoT (pp. 356-362). Springer, Cham.
  23. Thaseen, I. S., & Kumar, C. A. (2017). Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer and Information Sciences, 29(4), 462-472.
  24. Thaseen, I. S., & Kumar, C. A. (2017). Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer and Information Sciences, 29(4), 462-472.
  25. Mirjalili, S., Saremi, S., Mirjalili, S. M., & Coelho, L. D. S. (2016). Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization. Expert Systems with Applications, 47, 106-119.
  26. Saremi, S., Mirjalili, S. Z., & Mirjalili, S. M. (2015). Evolutionary population dynamics and grey wolf optimizer. Neural Computing and Applications, 26(5), 1257-1263.
  27. Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey wolf optimizer. Advances in engineering software, 69, 46-61.
  28. Yang, X. S., & Deb, S. (2009, December). Cuckoo search via Lévy flights. In 2009 World congress on nature & biologically inspired computing (NaBIC) (pp. 210-214). Ieee.
  29. Yang, X. S., & Deb, S. (2010). Engineering optimisation by cuckoo search. International Journal of Mathematical Modelling and Numerical Optimisation, 1(4), 330-343.
  30. Gandomi, A. H., Yang, X. S., & Alavi, A. H. (2013). Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems. Engineering with computers, 29(1), 17-35.
  31. Belgiu, M., & Drăguţ, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS journal of photogrammetry and remote sensing, 114, 24-31.
  32. Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from early developments to recent advancements. Systems Science & Control Engineering: An Open Access Journal, 2(1), 602-609.
  33. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  34. García, V., Mollineda, R. A., & Sánchez, J. S. (2008). On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Analysis and Applications, 11(3), 269-280.
  35. Rodrigues, É. O. (2018). Combining Minkowski and Cheyshev: New distance proposal and survey of distance metrics using k-nearest neighbours classifier. Pattern Recognition Letters, 110, 66-71.
  36. Meshram, S. G., Safari, M. J. S., Khosravi, K., & Meshram, C. (2020). Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction. Environmental Science and Pollution Research, 1-13.
  37. Williams, M. L., Mac Parthaláin, N., Brewer, P., James, W. P. J., & Rose, M. T. (2016). A novel behavioral model of the pasture-based dairy cow from GPS data using data mining and machine learning techniques. Journal of dairy science, 99(3), 2063-2075.
  38. Venkatesh, N., & Jayaraman, S. (2010, August). Human electrocardiogram for biometrics using DTW and FLDA. In 2010 20th International Conference on Pattern Recognition (pp. 3838-3841). IEEE.
  39. Ali, S., & Smith, K. A. (2006). On learning algorithm selection for classification. Applied Soft Computing, 6(2), 119-138.
  40. Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1), 20-29.
  41. Bekkar, M., Djemaa, H. K., & Alitouche, T. A. (2013). Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl, 3(10).
  42. Grzymala-Busse, J. W., & Hu, M. (2000, October). A comparison of several approaches to missing attribute values in data mining. In International Conference on Rough Sets and Current Trends in Computing (pp. 378-385). Springer, Berlin, Heidelberg.
  43. Jolliffe, I. T. (1986). Principal components in regression analysis. In Principal component analysis (pp. 129-155). Springer, New York, NY.
  44. De Souto, M. C., Jaskowiak, P. A., & Costa, I. G. (2015). Impact of missing data imputation methods on gene expression clustering and classification. BMC bioinformatics, 16(1), 1-9.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Chi-square feature selection Grey Wolf Optimiser Cuckoo search COVID-19