CFP last date
20 June 2024
Reseach Article

Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area

by B.I. Ayinla, Akande Oremei C.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 8
Year of Publication: 2024
Authors: B.I. Ayinla, Akande Oremei C.
10.5120/ijca2024923425

B.I. Ayinla, Akande Oremei C. . Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area. International Journal of Computer Applications. 186, 8 ( Feb 2024), 25-32. DOI=10.5120/ijca2024923425

@article{ 10.5120/ijca2024923425,
author = { B.I. Ayinla, Akande Oremei C. },
title = { Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2024 },
volume = { 186 },
number = { 8 },
month = { Feb },
year = { 2024 },
issn = { 0975-8887 },
pages = { 25-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number8/development-of-lr-multi-predicting-cross-validation-model-for-an-imbalanced-dataset-in-a-flood-susceptible-area/ },
doi = { 10.5120/ijca2024923425 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-29T03:28:31.579688+05:30
%A B.I. Ayinla
%A Akande Oremei C.
%T Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 8
%P 25-32
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Climate change has a profound impact on human well-being and health. It threatens the fundamental aspects of a good quality of life if not effectively managed. Changes in the frequency and intensity of heavy rainfall events can lead to shifts in the scale and occurrence of river floods, altering how floods happen. However, situations like floods, droughts, and famines raise global concerns. These complex alterations entail calamities and necessitate comprehensive analysis for effective prediction and counteraction. Machine learning algorithms and cross-validation techniques have been employed in the past for flood forecasting to identify patterns from various indicators. While traditional K-FOLD is an effective and commonly used cross-validation technique, the structure of each fold during randomization in terms of convergence and divergence of the dataset is unclear. This research introduces a logistic regression multi-predicting cross-validation (LRMPCV) to address overfitting in imbalanced datasets. The 20,543 tuples of the flooding dataset for Bangladesh from the Kaggle site were used for the experiment. This was divided into two sets, training and test, at a ratio of 80:20%. A Logistic Regression(LR) algorithm checks the distribution of data points for each fold in the three validation techniques during the 10-fold validation processes. Random Forest (RF) and LR models were eventually built from the best folds in each round for prediction. The area under the precision-recall curve (AUPRC) was the critical metric due to data imbalance. The new hybridized model demonstrates a marked improvement when the result is compared with the models built from traditional validation methods. The Random Forest had 99% AUPRC, against the previous result of 84.96% from the traditional KNN and other models. This underscores the power of meticulous model validation in enhancing model selection.

References
  1. Baldini, G., & Geneiatakis, D. (2019). A Performance Evaluation on Distance Measures in KNN for Mobile Malware Detection. In Proceedings of the 2019 International Conference on Control, Decision and Information Technologies (CoDIT) (pp. 193-198). doi: 10.1109/CoDIT.2019.8820510.
  2. Bajpai, D., & He, L. (2020). Evaluating KNN Performance on WESAD Dataset. In 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 60-62). Bhimtal, India. doi: 10.1109/CICN49253.2020.9242568.
  3. Brownlee, J. (2020, January 1). Failure of Classification Accuracy for Imbalanced Class Distributions - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 23, 2023, from https://machinelearningmastery.com/failure-of-accuracy-for-imbalanced-class-distributions/
  4. Gauhar, N., Das, S., & Moury, K. S. (2021). Prediction of Flood in Bangladesh using k-Nearest Neighbors Algorithm. In 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) (pp. 357-361). DHAKA, Bangladesh. doi: 10.1109/ICREST51555.2021.9331199.
  5. Gupta, I., Sharma, V., Kaur, S., & Singh, A. (2022). PCA-RF: An Efficient Parkinson's Disease Prediction Model based on Random Forest Classification.
  6. He, J., & Fan, X. (2019). Evaluating the Performance of the K-fold Cross-Validation Approach for Model Selection in Growth Mixture Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 26(1), 66-79. doi: 10.1080/10705511.2018.1500140.
  7. Huntingford, C., Jeffers, E. S., Bonsall, M. B., Christensen, H. M., Lees, T., & Yang, H. (2019). Machine learning and artificial intelligence to aid climate change research and preparedness. Environmental Research Letters, 14(12), 124007. https://doi.org/10.1088/1748-9326/ab4e55
  8. Jung, Y. (2018). Multiple predicting K-fold cross-validation for model selection. Journal of Nonparametric Statistics. https://doi.org/10.1080/10485252.2017.1404598.
  9. Kim, W. S., & Hong, J. (2022). An Application of Machine Learning Algorithms and a Stacking Ensemble Method for Mass Appraisal of Apartments. Han-Guk Gyeong-Yeong Gonghak Hoeji. https://doi.org/10.35373/kmes.27.2.6
  10. Ladi, T., Jabalameli, S., & Sharifi, A. (2022). Applications of machine learning and deep learning methods for climate change mitigation and adaptation. Environment and Planning B: Urban Analytics and City Science, 49, 239980832210852. https://doi.org/10.1177/23998083221085281.
  11. Lieber, M., Chin-Hong, P., Kelly, K., Dandu, M., & Weiser, S. D. (2022). A systematic review and meta-analysis assessing the impact of droughts, flooding, and climate variability on malnutrition. Global Public Health, 17(1), 68-82. https://doi.org/10.1080/17441692.2020.1860247
  12. Merriam-Webster. (n.d.). Flood. In Merriam-Webster.com dictionary. Retrieved August 24, 2023, from https://www.merriam-webster.com/dictionary/flood
  13. Panahi, M., Jaafari, A., Shirzadi, A., Shahabi, H., Rahmati, O., Omidvar, E., Bui, D., & Lee, S. (2020). Deep learning neural networks for spatially explicit prediction of flash flood probability. Geoscience Frontiers. https://doi.org/10.1016/j.gsf.2020.09.007.
  14. Prusty S, Patnaik S and Dash SK (2022), SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front. Nanotechnol. 4:972421.doi: 10.3389/fnano.2022.972421
  15. Tembusai, Z. R., Mawengkang, H., & Zarlis, M. (. (2021). K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification. Retrieved from https://www.neliti.com/publications/396954/k-nearest-neighbor-with-k-fold-cross-validation-and-analytic-hierarchy-process-o
  16. Ulker, E. (2022). Forecasting of Precipitation by Machine Learning Algorithms to Adapt Climate Change. Journal of Environmental and Natural Studies, 4(2), 109-118. DOI: https://doi.org/10.53472/jenas.1150975.
Index Terms

Computer Science
Information Sciences

Keywords

Climate Change Machine Learning Prediction Model Multi-Cross-Validation Skewed datasets Random Forest