CFP last date
20 May 2024
Reseach Article

A Novel Class Imbalance Learning using Ordering Points Clustering

by K. Nageswara Rao, T. Venkateswara Rao, D. Rajya Lakshmi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 51 - Number 16
Year of Publication: 2012
Authors: K. Nageswara Rao, T. Venkateswara Rao, D. Rajya Lakshmi
10.5120/8128-1863

K. Nageswara Rao, T. Venkateswara Rao, D. Rajya Lakshmi . A Novel Class Imbalance Learning using Ordering Points Clustering. International Journal of Computer Applications. 51, 16 ( August 2012), 33-42. DOI=10.5120/8128-1863

@article{ 10.5120/8128-1863,
author = { K. Nageswara Rao, T. Venkateswara Rao, D. Rajya Lakshmi },
title = { A Novel Class Imbalance Learning using Ordering Points Clustering },
journal = { International Journal of Computer Applications },
issue_date = { August 2012 },
volume = { 51 },
number = { 16 },
month = { August },
year = { 2012 },
issn = { 0975-8887 },
pages = { 33-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume51/number16/8128-1863/ },
doi = { 10.5120/8128-1863 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:50:35.586192+05:30
%A K. Nageswara Rao
%A T. Venkateswara Rao
%A D. Rajya Lakshmi
%T A Novel Class Imbalance Learning using Ordering Points Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 51
%N 16
%P 33-42
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i. e. examples of one class in a training data set vastly outnumber examples of the other class(es). This paper proposes a method belonging to under sampling approach which uses OPTICS one of the best visualization clustering technique for handling class imbalance problem. In the proposed approach, further Classification of new data is performed by applying C4. 5 algorithm as the base algorithm. The method is optimized by the selection of the most suitable clusters for deletion of the majority dataset based on visualization algorithms. An experimental analysis is carried out over a wide range of highly imbalanced data sets and uses the statistical tests suggested in the specialized literature. The results obtained show that our novel proposal outperforms other classic and recent models in terms of Area under the ROC Curve, F-measure, precision, TP rate and TN rate.

References
  1. WEISS GM. Miningwith rarity: A unifying framework[J]. Chicago,, USA, SIGKDD Explorations, 2004; 6(1): 7-19.
  2. A. Asuncion D. Newman. (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science), Irvine, CA: Univ. of California [Online]. Available: http://www. ics. uci. edu/?mlearn/MLRepository. htm
  3. Prati, R. C. , & Batista, G. E. A. P. A. (2004). Class imbalances versus class overlapping: An analysis of a learning system behavior. In Proceedings of Mexican international conference on artificial intelligence (MICAI) (pp. 312–321).
  4. Weiss, G. M. , & Provost, F. J. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354.
  5. T. Jo and N. Japkowicz, "Class imbalances versus small disjuncts," ACM SIGKDD Explor. Newslett. , vol. 6, no. 1, pp. 40–49, 2004.
  6. S. Zou, Y. Huang, Y. Wang, J. Wang, and C. Zhou, "SVM learning from imbalanced data by GA sampling for protein domain prediction," in Proc. 9th Int. Conf. Young Comput. Sci. , Hunan, China, 2008, pp. 982– 987.
  7. Jinguha Wang, JaneYou ,QinLi, YongXu," Extract minimum positive and maximum negative features for imbalanced binary classification", Pattern Recognition 45 (2012) 1136–1145.
  8. Iain Brown, Christophe Mues, "An experimental comparison of classification algorithms for imbalanced credit scoring data sets", Expert Systems with Applications 39 (2012) 3446–3453.
  9. Salvador Garc?´a, Joaqu?´nDerrac, Isaac Triguero, Cristobal J. Carmona, Francisco Herrera, "Evolutionary-based selection of generalized instances for imbalanced classification", Knowledge-Based Systems 25 (2012) 3–12.
  10. Jin Xiao, Ling Xie, Changzheng He, Xiaoyi Jiang," Dynamic classifier ensemble model for customer classification with imbalanced class distribution", Expert Systems with Applications 39 (2012) 3668–3675.
  11. Victoria López, Alberto Fernández, Jose G. Moreno-Torres, Francisco Herrera, "Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics", Expert Systems with Applications 39 (2012) 6585–6608.
  12. Yang Yong, "The Research of Imbalanced Data Set of Sample Sampling Method Based on K-Means Cluster and Genetic Algorithm", Energy Procedia 17 ( 2012 ) 164 – 170.
  13. Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, and Amri Napolitano," RUSBoost: A Hybrid Approach to Alleviating Class Imbalance", IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 40, NO. 1, JANUARY 2010 185.
  14. V. Garcia, J. S. Sanchez , R. A. Mollineda," On the effectiveness of preprocessing methods when dealing with different levels of class imbalance", Knowledge-Based Systems 25 (2012) 13–21.
  15. María Dolores Pérez-Godoy, Alberto Fernández, Antonio Jesús Rivera, María José del Jesus," Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets", Pattern Recognition Letters 31 (2010) 2375–2388.
  16. Der-Chiang Li, Chiao-WenLiu, SusanC. Hu," A learning method for the class imbalance problem with medical data sets", Computers in Biology and Medicine 40 (2010) 509–518.
  17. EnhongChe, Yanggang Lin, HuiXiong, QimingLuo, Haiping Ma," Exploiting probabilistic topic models to improve text categorization under class imbalance", Information Processing and Management 47 (2011) 202–214.
  18. Alberto Fernández, María José del Jesus, Francisco Herrera," On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets", Information Sciences 180 (2010) 1268–1291.
  19. Z. Chi, H. Yan, T. Pham, Fuzzy Algorithms with Applications to Image Processing and Pattern Recognition, World Scientific, 1996.
  20. H. Ishibuchi, T. Yamamoto, T. Nakashima, Hybridization of fuzzy GBML approaches for pattern classification problems, IEEE Transactions on System, Man and Cybernetics B 35 (2) (2005) 359–365.
  21. J. Burez, D. Van den Poel," Handling class imbalance in customer churn prediction", Expert Systems with Applications 36 (2009) 4626–4636.
  22. Che-Chang Hsu, Kuo-Shong Wang, Shih-Hsing Chang," Bayesian decision theory for support vector machines: Imbalance measurement and feature optimization", Expert Systems with Applications 38 (2011) 4698–4704.
  23. Alberto Fernández, María José del Jesus, Francisco Herrera," On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets", Expert Systems with Applications 36 (2009) 9805–9812.
  24. Jordan M. Malof, Maciej A. Mazurowski, Georgia D. Tourassi," The effect of class imbalance on case selection for case-based classifiers: An empirical study in the context of medical decision support", Neural Networks 25 (2012) 141–145.
  25. Haibo He, Member, IEEE, and Edwardo A. Garcia, "Learning from Imbalanced Data", IEEE Transactions on knowledge discovery and engineering , Vol 21, No. 9, September 2009.
  26. Yok-Yen Nguwi, Siu-Yeung Cho, "An unsupervised self-organizing learning with support vector ranking for imbalanced datasets", Expert Systems with Applications 37 (2010) 8303–8312.
  27. Bao-Liang LU, Xiao-Lin WANG, Yang YANG, Hai ZHAO," Learning from imbalanced data sets with a Min-Max modular support vector machine", Front. Electr. Electron. Eng. China 2011, 6(1): 56–71.
  28. J. R. Quinlan, C4. 5: Programs for Machine Learning, 1st ed. San Mateo, CA: Morgan Kaufmann Publishers, 1993.
  29. MihaelAnkerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander, "OPTICS: Ordering Points To Identify the Clustering Structure", Proc. ACM SIGMOD'99 Int. Conf. on Management of Data, Philadelphia PA, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Classification class imbalance CIL-OP