CFP last date
20 May 2025
Reseach Article

A Novel Adaptive Framework for Data Complexity Analysis in Imbalanced Binary Classification

by Debashis Roy, Anandarup Roy, Utpal Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 76
Year of Publication: 2025
Authors: Debashis Roy, Anandarup Roy, Utpal Roy
10.5120/ijca2025924677

Debashis Roy, Anandarup Roy, Utpal Roy . A Novel Adaptive Framework for Data Complexity Analysis in Imbalanced Binary Classification. International Journal of Computer Applications. 186, 76 ( Apr 2025), 42-51. DOI=10.5120/ijca2025924677

@article{ 10.5120/ijca2025924677,
author = { Debashis Roy, Anandarup Roy, Utpal Roy },
title = { A Novel Adaptive Framework for Data Complexity Analysis in Imbalanced Binary Classification },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2025 },
volume = { 186 },
number = { 76 },
month = { Apr },
year = { 2025 },
issn = { 0975-8887 },
pages = { 42-51 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number76/a-novel-adaptive-framework-for-data-complexity-analysis-in-imbalanced-binary-classification/ },
doi = { 10.5120/ijca2025924677 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-04-05T01:33:32.995687+05:30
%A Debashis Roy
%A Anandarup Roy
%A Utpal Roy
%T A Novel Adaptive Framework for Data Complexity Analysis in Imbalanced Binary Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 76
%P 42-51
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Improving classification performance when the dataset is imbalanced—that is, when the negative (majority) class is stronger than the positive (minority) class—is one of the most important problems in machine learning. Several researchers alleviated this situation by developing various data-level and algorithm-level techniques. However, it is important to note that an imbalanced dataset is not the sole factor compromising classification performance. It's not just the imbalanced dataset that makes classification harder; things like overlap, local instance ambiguity, intrinsic structural complexity, and so on also make the classification more complicated. Very few researchers have focused on data complexity, especially along with imbalanced datasets. This paper proposes a novel adaptive framework that measures data complexities like instance overlap, multiresolution overlap, structural overlap, kNN-based complexity for minority instances, and more. This systematized adaptive measure selection framework sorts through the complexity of the data based on how imbalanced the datasets are and suggests preprocessing steps and the right models to make the classification task easier. The work includes a theoretical analysis, the lemma, and the corollary, as well as specific steps for putting the ideas into practice. This framework, which is aware of taxonomies and provides actionable insights that greatly improve the performance of imbalanced classification, makes it new and very useful for both researchers and practitioners.

References
  1. Chen, W., Yang, K., Yu, Z. et al. A survey on imbalanced learning: latest research, applications and future directions. Artif Intell Rev 57, 137 (2024). https://doi.org/10.1007/s10462-024-10759-6.
  2. Victor H. Barella, Luís P.F. Garcia, Marcilio C.P. de Souto, Ana C. Lorena, André C.P.L.F. de Carvalho, Assessing the data complexity of imbalanced datasets, Information Sciences, Volume 553, 2021, Pages 83-109, ISSN 0020-0255, https://doi.org/10.1016/j.ins.2020.12.006.
  3. Xiaohui Wan, Zheng Zheng, Fangyun Qin, and Xuhui Lu. 2024. Data Complexity: A New Perspective for Analyzing the Difficulty of Defect Prediction Tasks. ACM Trans. Softw. Eng. Methodol. 33, 6, Article 141 (July 2024), 45 pages. https://doi.org/10.1145/3649596.
  4. Tin Kam Ho and M. Basu, "Complexity measures of supervised classification problems," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 289-300, March 2002, doi: 10.1109/34.990132.
  5. Ho, T.K., Basu, M., Law, M.H.C. (2006). Measures of Geometrical Complexity in Classification Problems. In: Basu, M., Ho, T.K. (eds) Data Complexity in Pattern Recognition. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84628-172-3_1.
  6. Lorena, A.C., de Souto, M.C.P. (2015). On Measuring the Complexity of Classification Problems. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_18.
  7. Ana C. Lorena, Luís P. F. Garcia, Jens Lehmann, Marcilio C. P. Souto, and Tin Kam Ho. 2019. How Complex Is Your Classification Problem? A Survey on Measuring Classification Complexity. ACM Comput. Surv. 52, 5, Article 107 (September 2020), 34 pages. https://doi.org/10.1145/3347711.
  8. Nafees Anwar, Geoff Jones, and Siva Ganesh. 2014. Measurement of data complexity for classification problems with unbalanced data. Stat. Anal. Data Min. 7, 3 (June 2014), 194–211.
  9. Singh, Deepika, et al. “Weighted k ‐nearest neighbor based data complexity metrics for imbalanced datasets.” Statistical Analysis and Data Mining, vol. 13, no. 4, Jun. 2020, pp. 394-404. https://doi.org/10.1002/sam.11463.
  10. Y. Lu, Y. -M. Cheung and Y. Y. Tang, "Bayes Imbalance Impact Index: A Measure of Class Imbalanced Data Set for Classification Problem," in IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3525-3539, Sept. 2020, doi: 10.1109/TNNLS.2019.2944962.
  11. Miriam Seoane Santos, Pedro Henriques Abreu, Nathalie Japkowicz, Alberto Fernández, João Santos, A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research, Information Fusion, Volume 89, 2023, Pages 228-253, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2022.08.017.
  12. Pattaramon Vuttipittayamongkol, Eyad Elyan, Andrei Petrovski, On the class overlap problem in imbalanced data classification, Knowledge-Based Systems, Volume 212, 2021, 12., 106631, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2020.106631.
  13. 13.E. B. Fatima, B. Omar, E. M. Abdelmajid, F. Rustam, A. Mehmood and G. S. Choi, "Minimizing the Overlapping Degree to Improve Class-Imbalanced Learning Under Sparse Feature Selection: Application to Fraud Detection," in IEEE Access, vol. 9, pp. 28101-28110, 2021, doi: 10.1109/ACCESS.2021.3056285.
  14. 14.Komorniczak, Joanna et al. “Data complexity and classification accuracy correlation in oversampling algorithms.” Learning with Imbalanced Domains: Theory and Applications (2022), Proceeding of Machine Learning Research.
Index Terms

Computer Science
Information Sciences
Pattern Recognition
Machine Learning
Imbalance learning
Algorithms

Keywords

Data Complexity Imbalanced classification Adaptive Measure Selection Overlap Theoretical Bounds