Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Aman Gupta, Pranita Jain
10.5120/ijca2017913055

Aman Gupta and Pranita Jain. A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection. International Journal of Computer Applications 160(5):41-44, February 2017. BibTeX

@article{10.5120/ijca2017913055,
	author = {Aman Gupta and Pranita Jain},
	title = {A Map Reduce Hadoop Implementation of Random Tree Algorithm based on Correlation Feature Selection},
	journal = {International Journal of Computer Applications},
	issue_date = {February 2017},
	volume = {160},
	number = {5},
	month = {Feb},
	year = {2017},
	issn = {0975-8887},
	pages = {41-44},
	numpages = {4},
	url = {http://www.ijcaonline.org/archives/volume160/number5/27073-2017913055},
	doi = {10.5120/ijca2017913055},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Random Tree is a popular data classification classifier for machine learning. Feature reduction is one of the important research issues in big data. Most existing feature reduction algorithms are now faced with two challenging problems. On one hand, they have infrequently taken granular computing into thinking. On the other hand, they still cannot deal with massive data. Massive data processing is a difficult problem in the age of big data. Traditional feature reduction algorithms are generally time-consuming when facing big data. For speedily processing, we introduce a scalable fast approximate attribute reduction algorithm with Map Reduce. We divide the original data into many tiny chunks, and use reduction algorithm for each chunk. The reduction algorithm is based on correlation feature selection and generates decision rules by using Random Tree Classifier. Finally, feature reduction algorithm is proposed in data and task parallel using Hadoop Map Reduce framework with WEKA environment. Experimental results demonstrate that the proposed classifier can scale well and efficiently process big data.

References

  1. Borthakur, D. The Hadoop Distributed File System: Architecture and Design, 2007.
  2. Jiawei Han, Yanheng Liu, Xin Sun A Scalable Random Forest Algorithm Based on Map Reduce, IEEE 2013.
  3. Q. He, F.Z. Zhuang, J. e. Li, Z.z. Shi. Parallel implementation of classification algorithms based on Map Reduce. RSKT, LNAI 6401,pp. 655-662, 2010
  4. Http://wiki.pentaho.com/display/DATAMINING/RandomTree
  5. M. Hall 1999, Correlation-based Feature Selection for Machine Learning
  6. Baris Senliol, gokhan gulgezen, "Fast Correlation Based Filter with a different search strategy." Computer and Information Sciences, 2008. ISCIS'08. 23rd International Symposium on. IEEE, 2008.
  7. Junbo Zhang, Tianrui Li a, Da Ruan, Zizhe Gao, Chengbing Zhao, A parallel method for computing rough set approximations,2012.
  8. https://archive.ics.uci.edu/ml/datasets.html

Keywords

Hadoop, Map Reduce, Random Tree, Big Data, Correlation.