Classification Imbalanced Data Sets: A Survey

Shrouk El-Amir; Heba El-Fiqi

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Prediction of Breast Cancer Risk Level with Risk Factors in Perspective to Bangladeshi Women using Data Mining

November

2013

Clone Attack Detection Protocols in Wireless Sensor Networks: A Survey

July

2014

An Efficient Gateway Election Algorithm for Clusters in MANET

September

2014

Security Attacks in Mobile Adhoc Networks (MANET): A Literature Survey

July

2015

Reseach Article

Classification Imbalanced Data Sets: A Survey

by Shrouk El-Amir, Heba El-Fiqi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 177 - Number 23

Year of Publication: 2019

Authors: Shrouk El-Amir, Heba El-Fiqi

10.5120/ijca2019919682

Shrouk El-Amir, Heba El-Fiqi . Classification Imbalanced Data Sets: A Survey. International Journal of Computer Applications. 177, 23 ( Dec 2019), 20-23. DOI=10.5120/ijca2019919682

@article{ 10.5120/ijca2019919682,

author = { Shrouk El-Amir, Heba El-Fiqi },

title = { Classification Imbalanced Data Sets: A Survey },

journal = { International Journal of Computer Applications },

issue_date = { Dec 2019 },

volume = { 177 },

number = { 23 },

month = { Dec },

year = { 2019 },

issn = { 0975-8887 },

pages = { 20-23 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume177/number23/31037-2019919682/ },

doi = { 10.5120/ijca2019919682 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:46:42.252966+05:30

%A Shrouk El-Amir

%A Heba El-Fiqi

%T Classification Imbalanced Data Sets: A Survey

%J International Journal of Computer Applications

%@ 0975-8887

%V 177

%N 23

%P 20-23

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Unbalanced data, a snag often found in real-world applications, can seriously adversely affect machine learning algorithms ' classification efficiency. Various tries are made to classify unbalanced data sets. In order to face the imbalanced data sets snag, we should rebalance them artificially through machine learning classifiers by oversampling and/or under-sampling.

References

Aurelio, Y.S., et al., Learning from imbalanced data sets with weighted cross-entropy function. Neural Processing Letters, 2019: p. 1-13.
Ali, Z., et al. Empirical Study of Associative Classifiers on Imbalanced Datasets in KEEL. in 2018 9th International Conference on Information, Intelligence, Systems and Applications (IISA). 2018. IEEE.
Arafat, M., A. Qusef, and G. Sammour. Detection of Wangiri Telecommunication Fraud Using Ensemble Learning. in 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT). 2019. IEEE.
Wang, H., Utilizing Imbalanced Data and Classification Cost Matrix to Predict Movie Preferences. arXiv preprint arXiv:1812.02529, 2018.
Maheshwari, S., R. Jain, and R. Jadon, A Review on Class Imbalance Problem: Analysis and Potential Solutions. International Journal of Computer Science Issues (IJCSI), 2017. 14(6): p. 43-51.
Bermejo, P., J.A. Gámez, and J.M. Puerta, Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets. Expert Systems with Applications, 2011. 38(3): p. 2072-2080.
Huang, C., et al., Deep imbalanced learning for face recognition and attribute prediction. IEEE transactions on pattern analysis and machine intelligence, 2019.
Chan, R., et al., Application of decision rules for handling class imbalance in semantic segmentation. arXiv preprint arXiv:1901.08394, 2019.
Potharlanka, J.L. and M.P. Turumella, Weighted SVMBoost based Hybrid Rule Extraction Methods for Software Defect Prediction. International Journal of Rough Sets and Data Analysis (IJRSDA), 2019. 6(2): p. 51-60.
Raskutti, B. and A. Kowalczyk, Extreme re-balancing for SVMs: a case study. ACM Sigkdd Explorations Newsletter, 2004. 6(1): p. 60-69.
Folorunso, S. and A. Adeyemo, Empirical Study of Enhanced Sampling Schemes with Ensembles to Alleviate the Class Imbalance Problem.
Schubach, M., et al., Variant relevance prediction in extremely imbalanced training sets. F1000Research, 2017. 6: p. 1392.
Abdullah, Z., et al. 2M-SELAR: A Model for Mining Sequential Least Association Rules. in Proceedings of the International Conference on Data Engineering 2015 (DaEng-2015). 2019. Springer.
Wu, G.P. and K.C. Chan. Clustering driving trip trajectory data based on pattern discovery techniques. in 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA). 2018. IEEE.
Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
Freund, Y. and R. Schapire, A short introduction to boosting. Journal-Japanese Society For Artificial Intelligence, 1999. 14(771-780): p. 1612.
Liu, H. and M. Cocea, Granular computing-based approach of rule learning for binary classification. Granular Computing, 2019. 4(2): p. 275-283.
Xia, Y., et al., A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 2017. 78: p. 225-241.
Fan, Q., et al., Entropy-based fuzzy support vector machine for imbalanced datasets. Knowledge-Based Systems, 2017. 115: p. 87-99.
Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
Garcı, S., et al., Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems, 2012. 25(1): p. 3-12.
Yang, P., et al., Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications. IEEE transactions on cybernetics, 2013. 44(3): p. 445-455.
Ramyachitra, D. and P. Manikandan, Imbalanced dataset classification and solutions: a review. International Journal of Computing and Business Research (IJCBR), 2014. 5(4).
Tomek, I., Two modifications of CNN. IEEE Trans. Systems, Man and Cybernetics, 1976. 6: p. 769-772.
Hart, P., The condensed nearest neighbor rule (Corresp.). IEEE transactions on information theory, 1968. 14(3): p. 515-516.
Zhao, P., et al. Cost-sensitive online classification with adaptive regularization and its applications. in 2015 IEEE International Conference on Data Mining. 2015. IEEE.
Chen, C., A. Liaw, and L. Breiman, Using random forest to learn imbalanced data. University of California, Berkeley, 2004. 110(1-12): p. 24.
Yao, D., J. Yang, and X. Zhan, An improved random forest algorithm for class-imbalanced data classification and its application in PAD risk factors analysis. The Open Electrical & Electronic Engineering Journal, 2013. 7(1).
Gong, J. and H. Kim, RHSBoost: Improving classification performance in imbalance data. Computational Statistics & Data Analysis, 2017. 111: p. 1-13.
Xie, W., et al., An Improved Oversampling Algorithm Based on the Samples’ Selection Strategy for Classifying Imbalanced Data. Mathematical Problems in Engineering, 2019. 2019.
Boonchuay, K., K. Sinapiromsaran, and C. Lursinsap, Decision tree induction based on minority entropy for the class imbalance problem. Pattern Analysis and Applications, 2017. 20(3): p. 769-782

Index Terms

Computer Science

Information Sciences

Keywords

Imbalance dataset sampling cost-sensitive learning imbalance ratio