CFP last date
22 April 2024
Reseach Article

A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection

by Aman Gupta, Pranita Jain
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 160 - Number 5
Year of Publication: 2017
Authors: Aman Gupta, Pranita Jain
10.5120/ijca2017913055

Aman Gupta, Pranita Jain . A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection. International Journal of Computer Applications. 160, 5 ( Feb 2017), 41-44. DOI=10.5120/ijca2017913055

@article{ 10.5120/ijca2017913055,
author = { Aman Gupta, Pranita Jain },
title = { A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2017 },
volume = { 160 },
number = { 5 },
month = { Feb },
year = { 2017 },
issn = { 0975-8887 },
pages = { 41-44 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume160/number5/27073-2017913055/ },
doi = { 10.5120/ijca2017913055 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:05:53.839921+05:30
%A Aman Gupta
%A Pranita Jain
%T A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 160
%N 5
%P 41-44
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Random Tree is a popular data classification classifier for machine learning. Feature reduction is one of the important research issues in big data. Most existing feature reduction algorithms are now faced with two challenging problems. On one hand, they have infrequently taken granular computing into thinking. On the other hand, they still cannot deal with massive data. Massive data processing is a difficult problem in the age of big data. Traditional feature reduction algorithms are generally time-consuming when facing big data. For speedily processing, we introduce a scalable fast approximate attribute reduction algorithm with Map Reduce. We divide the original data into many tiny chunks, and use reduction algorithm for each chunk. The reduction algorithm is based on correlation feature selection and generates decision rules by using Random Tree Classifier. Finally, feature reduction algorithm is proposed in data and task parallel using Hadoop Map Reduce framework with WEKA environment. Experimental results demonstrate that the proposed classifier can scale well and efficiently process big data.

References
  1. Borthakur, D. The Hadoop Distributed File System: Architecture and Design, 2007.
  2. Jiawei Han, Yanheng Liu, Xin Sun A Scalable Random Forest Algorithm Based on Map Reduce, IEEE 2013.
  3. Q. He, F.Z. Zhuang, J. e. Li, Z.z. Shi. Parallel implementation of classification algorithms based on Map Reduce. RSKT, LNAI 6401,pp. 655-662, 2010
  4. Http://wiki.pentaho.com/display/DATAMINING/RandomTree
  5. M. Hall 1999, Correlation-based Feature Selection for Machine Learning
  6. Baris Senliol, gokhan gulgezen, "Fast Correlation Based Filter with a different search strategy." Computer and Information Sciences, 2008. ISCIS'08. 23rd International Symposium on. IEEE, 2008.
  7. Junbo Zhang, Tianrui Li a, Da Ruan, Zizhe Gao, Chengbing Zhao, A parallel method for computing rough set approximations,2012.
  8. https://archive.ics.uci.edu/ml/datasets.html
Index Terms

Computer Science
Information Sciences

Keywords

Hadoop Map Reduce Random Tree Big Data Correlation.