CFP last date
20 March 2024
Reseach Article

A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach

by Mohammad Imran, Vaddi Srinivasa Rao
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 179 - Number 33
Year of Publication: 2018
Authors: Mohammad Imran, Vaddi Srinivasa Rao
10.5120/ijca2018916743

Mohammad Imran, Vaddi Srinivasa Rao . A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach. International Journal of Computer Applications. 179, 33 ( Apr 2018), 18-21. DOI=10.5120/ijca2018916743

@article{ 10.5120/ijca2018916743,
author = { Mohammad Imran, Vaddi Srinivasa Rao },
title = { A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2018 },
volume = { 179 },
number = { 33 },
month = { Apr },
year = { 2018 },
issn = { 0975-8887 },
pages = { 18-21 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume179/number33/29210-2018916743/ },
doi = { 10.5120/ijca2018916743 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:57:18.765853+05:30
%A Mohammad Imran
%A Vaddi Srinivasa Rao
%T A Novel Technique on Class Imbalance Big Data using Analogous under Sampling Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 179
%N 33
%P 18-21
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, we propose hybrid Random under Sampled Imbalance Big Data (USIBD) framework to extract knowledge from class imbalance big data. A novel under-sampling method for the base learner is also proposed to handle the dynamic class-imbalance problem caused by the gradual evolution of classes in big data. The proposed USIBD knowledge discovery framework is robust and less sensitive to outliers where non-uniform distribution of data is applied. Empirical studies demonstrate the effectiveness of USIBD in various class imbalance big datasets scenarios in comparison to existing methods.

References
  1. O. Maimon, and L. Rokach, Data mining and knowledge discovery handbook, Berlin: Springer, 2010.
  2. Rajiv Sambasivan, SourishDas,”Big Data Classification Using Augmented Decision Trees”, arXiv preprint arXiv:1710.09567, 2017.
  3. Petra Perner,”Big Data, Decision Tree Induction, and Image Analysis for the Discovery of Decision Rules for Colon Examination”, International Journal of Engineering Research & Science (IJOER) ISSN: [2395-6992] [Vol-3, Issue-8, August- 2017].
  4. Tianyi Yang and Anne HeeHiongNgu,”Implementation of Decision Tree Using Hadoop Map Reduce”,Yang and Ngu, Int J Biomed Data Min 2016, 6:1
  5. DOI: 10.4172/2090-4924.1000125.
  6. Armando Segatori, Francesco Marcelloni, and Witold Pedrycz,” On Distributed Fuzzy Decision Trees for BigData”,DOI10.1109/TFUZZ.2016.2646746,IEEE Transactions on Fuzzy Systems.
  7. Hanif Arief Wisesa, M. Anwar Ma’sum, PetrusMursanto, Andreas Febrian,Processing Big Data with Decision TreesA Case Study in Large Traffic Data”, IWBIS 2016 978-1-5090-3477-2/16/2016 IEEE.
  8. Blake C, Merz CJ (2000) UCI repository of machine learning databases. Machine-readable data repository. Department of Information and Computer Science, University of California at Irvine, Irvine.http://www.ics.uci.edu/mlearn/MLRepository.html.
  9. Witten, I.H. and Frank, E. (2005) Data Mining:Practical machine learning tools and techniques.2nd edition Morgan Kaufmann, San Francisco.
  10. J. Quinlan. C4.5 Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann, 1993.
Index Terms

Computer Science
Information Sciences

Keywords

Classification Big data Imbalanced data Under Sampling USIBD