Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area

B.I. Ayinla; Akande Oremei C.

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Article:Computer Applications in Power Electronic Systems

November

2010

A Novel VLSI Design of Sign and Unsigned Irreversible and Reversible Multiplier Circuit

November

2015

An Efficient Distributed Dynamic Load Balancing Method based on Hybrid Approach in Cloud Computing

Jul

2017

Century Identification and Recognition of Ancient Tamil Character Recognition

July

2011

Reseach Article

Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area

by B.I. Ayinla, Akande Oremei C.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 186 - Number 8

Year of Publication: 2024

Authors: B.I. Ayinla, Akande Oremei C.

10.5120/ijca2024923425

B.I. Ayinla, Akande Oremei C. . Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area. International Journal of Computer Applications. 186, 8 ( Feb 2024), 25-32. DOI=10.5120/ijca2024923425

@article{ 10.5120/ijca2024923425,

author = { B.I. Ayinla, Akande Oremei C. },

title = { Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area },

journal = { International Journal of Computer Applications },

issue_date = { Feb 2024 },

volume = { 186 },

number = { 8 },

month = { Feb },

year = { 2024 },

issn = { 0975-8887 },

pages = { 25-32 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume186/number8/development-of-lr-multi-predicting-cross-validation-model-for-an-imbalanced-dataset-in-a-flood-susceptible-area/ },

doi = { 10.5120/ijca2024923425 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-29T03:28:31.579688+05:30

%A B.I. Ayinla

%A Akande Oremei C.

%T Development of LR-Multi Predicting Cross-Validation Model for an Imbalanced Dataset in a Flood Susceptible Area

%J International Journal of Computer Applications

%@ 0975-8887

%V 186

%N 8

%P 25-32

%D 2024

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Climate change has a profound impact on human well-being and health. It threatens the fundamental aspects of a good quality of life if not effectively managed. Changes in the frequency and intensity of heavy rainfall events can lead to shifts in the scale and occurrence of river floods, altering how floods happen. However, situations like floods, droughts, and famines raise global concerns. These complex alterations entail calamities and necessitate comprehensive analysis for effective prediction and counteraction. Machine learning algorithms and cross-validation techniques have been employed in the past for flood forecasting to identify patterns from various indicators. While traditional K-FOLD is an effective and commonly used cross-validation technique, the structure of each fold during randomization in terms of convergence and divergence of the dataset is unclear. This research introduces a logistic regression multi-predicting cross-validation (LRMPCV) to address overfitting in imbalanced datasets. The 20,543 tuples of the flooding dataset for Bangladesh from the Kaggle site were used for the experiment. This was divided into two sets, training and test, at a ratio of 80:20%. A Logistic Regression(LR) algorithm checks the distribution of data points for each fold in the three validation techniques during the 10-fold validation processes. Random Forest (RF) and LR models were eventually built from the best folds in each round for prediction. The area under the precision-recall curve (AUPRC) was the critical metric due to data imbalance. The new hybridized model demonstrates a marked improvement when the result is compared with the models built from traditional validation methods. The Random Forest had 99% AUPRC, against the previous result of 84.96% from the traditional KNN and other models. This underscores the power of meticulous model validation in enhancing model selection.

References

Baldini, G., & Geneiatakis, D. (2019). A Performance Evaluation on Distance Measures in KNN for Mobile Malware Detection. In Proceedings of the 2019 International Conference on Control, Decision and Information Technologies (CoDIT) (pp. 193-198). doi: 10.1109/CoDIT.2019.8820510.
Bajpai, D., & He, L. (2020). Evaluating KNN Performance on WESAD Dataset. In 2020 12th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 60-62). Bhimtal, India. doi: 10.1109/CICN49253.2020.9242568.
Brownlee, J. (2020, January 1). Failure of Classification Accuracy for Imbalanced Class Distributions - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 23, 2023, from https://machinelearningmastery.com/failure-of-accuracy-for-imbalanced-class-distributions/
Gauhar, N., Das, S., & Moury, K. S. (2021). Prediction of Flood in Bangladesh using k-Nearest Neighbors Algorithm. In 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST) (pp. 357-361). DHAKA, Bangladesh. doi: 10.1109/ICREST51555.2021.9331199.
Gupta, I., Sharma, V., Kaur, S., & Singh, A. (2022). PCA-RF: An Efficient Parkinson's Disease Prediction Model based on Random Forest Classification.
He, J., & Fan, X. (2019). Evaluating the Performance of the K-fold Cross-Validation Approach for Model Selection in Growth Mixture Modeling. Structural Equation Modeling: A Multidisciplinary Journal, 26(1), 66-79. doi: 10.1080/10705511.2018.1500140.
Huntingford, C., Jeffers, E. S., Bonsall, M. B., Christensen, H. M., Lees, T., & Yang, H. (2019). Machine learning and artificial intelligence to aid climate change research and preparedness. Environmental Research Letters, 14(12), 124007. https://doi.org/10.1088/1748-9326/ab4e55
Jung, Y. (2018). Multiple predicting K-fold cross-validation for model selection. Journal of Nonparametric Statistics. https://doi.org/10.1080/10485252.2017.1404598.
Kim, W. S., & Hong, J. (2022). An Application of Machine Learning Algorithms and a Stacking Ensemble Method for Mass Appraisal of Apartments. Han-Guk Gyeong-Yeong Gonghak Hoeji. https://doi.org/10.35373/kmes.27.2.6
Ladi, T., Jabalameli, S., & Sharifi, A. (2022). Applications of machine learning and deep learning methods for climate change mitigation and adaptation. Environment and Planning B: Urban Analytics and City Science, 49, 239980832210852. https://doi.org/10.1177/23998083221085281.
Lieber, M., Chin-Hong, P., Kelly, K., Dandu, M., & Weiser, S. D. (2022). A systematic review and meta-analysis assessing the impact of droughts, flooding, and climate variability on malnutrition. Global Public Health, 17(1), 68-82. https://doi.org/10.1080/17441692.2020.1860247
Merriam-Webster. (n.d.). Flood. In Merriam-Webster.com dictionary. Retrieved August 24, 2023, from https://www.merriam-webster.com/dictionary/flood
Panahi, M., Jaafari, A., Shirzadi, A., Shahabi, H., Rahmati, O., Omidvar, E., Bui, D., & Lee, S. (2020). Deep learning neural networks for spatially explicit prediction of flash flood probability. Geoscience Frontiers. https://doi.org/10.1016/j.gsf.2020.09.007.
Prusty S, Patnaik S and Dash SK (2022), SKCV: Stratiﬁed K-fold cross-validation on ML classiﬁers for predicting cervical cancer. Front. Nanotechnol. 4:972421.doi: 10.3389/fnano.2022.972421
Tembusai, Z. R., Mawengkang, H., & Zarlis, M. (. (2021). K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification. Retrieved from https://www.neliti.com/publications/396954/k-nearest-neighbor-with-k-fold-cross-validation-and-analytic-hierarchy-process-o
Ulker, E. (2022). Forecasting of Precipitation by Machine Learning Algorithms to Adapt Climate Change. Journal of Environmental and Natural Studies, 4(2), 109-118. DOI: https://doi.org/10.53472/jenas.1150975.

Index Terms

Computer Science

Information Sciences

Keywords

Climate Change Machine Learning Prediction Model Multi-Cross-Validation Skewed datasets Random Forest