CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

A Survey on Methods for Solving Data Imbalance Problem for Classification

by Arpit Singh, Anuradha Purohit
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 127 - Number 15
Year of Publication: 2015
Authors: Arpit Singh, Anuradha Purohit
10.5120/ijca2015906677

Arpit Singh, Anuradha Purohit . A Survey on Methods for Solving Data Imbalance Problem for Classification. International Journal of Computer Applications. 127, 15 ( October 2015), 37-41. DOI=10.5120/ijca2015906677

@article{ 10.5120/ijca2015906677,
author = { Arpit Singh, Anuradha Purohit },
title = { A Survey on Methods for Solving Data Imbalance Problem for Classification },
journal = { International Journal of Computer Applications },
issue_date = { October 2015 },
volume = { 127 },
number = { 15 },
month = { October },
year = { 2015 },
issn = { 0975-8887 },
pages = { 37-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume127/number15/22809-2015906677/ },
doi = { 10.5120/ijca2015906677 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:18:09.091540+05:30
%A Arpit Singh
%A Anuradha Purohit
%T A Survey on Methods for Solving Data Imbalance Problem for Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 127
%N 15
%P 37-41
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The term “data imbalance” in classification is a well established phenomenon in which data set contains unbalanced class distributions. Dataset is called unbalanced if it contains at least one class which is presented by very few examples. A range of solutions have been proposed for the problem of data imbalance including data sampling, cost evaluation of model, bagging, boosting, Genetic Programming (GP) based methods etc. This paper presents a survey of various methods introduced by researchers to handle data imbalance problem in order to improve classification performance and further the comparison between the methods on the basis of their advantages and disadvantages is done.

References
  1. J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press, 1992.
  2. Urvesh Bhowan, Mark Johnston and Mengije Zhang “Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data” IEEE Transaction on system, man and cybernetics—part b, volume 42, pp 406-421 (2012).
  3. J. Eggermont, J. N. Kok, and W. A. Kosters, “Genetic programming for data classification: Partitioning the search space,” in Proc. ACM SAC, pp. 1001–1005, 2004.
  4. M. Zhang and W. Smart, “Multiclass object classification using genetic programming,” in Proc. Appl. Evol. Comput. vol. 3005, LNCS, 2004, pp. 369–378.
  5. U. Bhowan, M. Johnston, and M. Zhang, “A comparison of classification strategies in genetic programming with unbalanced data,” in Proc. 23rd Australasian Joint Conf. Artif. Intell. vol. 6464, LNCS, J. Li, Ed., 2010, pp. 243–252.
  6. A. Orriols, “evolutionary rule based system for dataset,” in springer verlag soft comput., 2008, pp. 213-225.
  7. A. G. García and M. J. Muñoz-Bouzo. Sampling-related frames in finite U-invariant subspaces. Appl. Comput. Harmon. Anal., 39:173-184, 2015
  8. G. Patterson and M. Zhang, “Fitness functions in genetic programming for classification with unbalanced data,” Proceedings of the 20th Australian Joint Conference on Artificial Intelligence, vol. 4830, pp. 769–775, December 2007.
  9. Doucette and M. I. Heywood, “GP classification under imbalanced data sets: Active sampling and AUC approximation,” in Proceedings of EuroGP 08, pp. 266–277, 2008.
  10. Song, M. Heywood, and A. Zincir-Heywood, “Training genetic programming on half a million patterns: an example from anomaly detection,” IEEE Transactions on Evolutionary Computation, vol. 9, pp. 225–239, June 2005.
  11. J. Eggermont, A. Eiben, and J. van Hemert, “Adapting the fitness function in GP for data mining,” EuroGP’99, LNCS, vol. 1598, pp. 193–202, 1999.
  12. L breiman “Bagging Predictores”, in Machine Learnin Springer, Vol. 24, pp. 123–140, 1996.
  13. Kerns. “Thoughts on hypothesis boosting” in Machine learning Project, Vol 12, pp. 1-9, 1988.
  14. W. Lee “Margin and Boosting”, Machine Learning proceeding of 14th international conference, pp. 1-9, 1997.
  15. Yoav., Robert E. “A short introduction to boosting”, A journal of Japanese Society for Artificial Intelligence, vol. 14, pp. 771-780, September, 1999.
  16. Chris, “RUSBoost: A Hybrid Approach to Alleviating Class Imbalance Problem”, IEEE transactions on systems, man, and cybernetics part a: systems and humans, vol. 40, pp.185,197, 2010
Index Terms

Computer Science
Information Sciences

Keywords

Classification data imbalance genetic programming boosting bagging sampling.