International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 186 - Number 8 |
Year of Publication: 2024 |
Authors: B.I. Ayinla, Akande Oremei C. |
10.5120/ijca2024923425 |
B.I. Ayinla, Akande Oremei C. . Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area. International Journal of Computer Applications. 186, 8 ( Feb 2024), 25-32. DOI=10.5120/ijca2024923425
Climate change has a profound impact on human well-being and health. It threatens the fundamental aspects of a good quality of life if not effectively managed. Changes in the frequency and intensity of heavy rainfall events can lead to shifts in the scale and occurrence of river floods, altering how floods happen. However, situations like floods, droughts, and famines raise global concerns. These complex alterations entail calamities and necessitate comprehensive analysis for effective prediction and counteraction. Machine learning algorithms and cross-validation techniques have been employed in the past for flood forecasting to identify patterns from various indicators. While traditional K-FOLD is an effective and commonly used cross-validation technique, the structure of each fold during randomization in terms of convergence and divergence of the dataset is unclear. This research introduces a logistic regression multi-predicting cross-validation (LRMPCV) to address overfitting in imbalanced datasets. The 20,543 tuples of the flooding dataset for Bangladesh from the Kaggle site were used for the experiment. This was divided into two sets, training and test, at a ratio of 80:20%. A Logistic Regression(LR) algorithm checks the distribution of data points for each fold in the three validation techniques during the 10-fold validation processes. Random Forest (RF) and LR models were eventually built from the best folds in each round for prediction. The area under the precision-recall curve (AUPRC) was the critical metric due to data imbalance. The new hybridized model demonstrates a marked improvement when the result is compared with the models built from traditional validation methods. The Random Forest had 99% AUPRC, against the previous result of 84.96% from the traditional KNN and other models. This underscores the power of meticulous model validation in enhancing model selection.