Enhance Performance of Mapreduce Job on Hadoop Framework using Setup and Cleanup

Priyam Jain; Satyaranjan Patra; Pankaj Richhariya

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Article:DCT Sectorization for Feature Vector Generation in CBIR

November

2010

Scaling the Effectiveness of Existing Techniques towards Enhancing Performance of UWB Antenna

April

2015

Simplification of Boolean Algebra through DNA Computing

February

2010

Performance and Comparative Analysis of the Two Contrary Approaches for Detecting Near Duplicate Web Documents in Web Crawling

December

2012

Reseach Article

Enhance Performance of Mapreduce Job on Hadoop Framework using Setup and Cleanup

by Priyam Jain, Satyaranjan Patra, Pankaj Richhariya

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 155 - Number 8

Year of Publication: 2016

Authors: Priyam Jain, Satyaranjan Patra, Pankaj Richhariya

10.5120/ijca2016912400

Priyam Jain, Satyaranjan Patra, Pankaj Richhariya . Enhance Performance of Mapreduce Job on Hadoop Framework using Setup and Cleanup. International Journal of Computer Applications. 155, 8 ( Dec 2016), 36-40. DOI=10.5120/ijca2016912400

@article{ 10.5120/ijca2016912400,

author = { Priyam Jain, Satyaranjan Patra, Pankaj Richhariya },

title = { Enhance Performance of Mapreduce Job on Hadoop Framework using Setup and Cleanup },

journal = { International Journal of Computer Applications },

issue_date = { Dec 2016 },

volume = { 155 },

number = { 8 },

month = { Dec },

year = { 2016 },

issn = { 0975-8887 },

pages = { 36-40 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume155/number8/26628-2016912400/ },

doi = { 10.5120/ijca2016912400 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:00:45.625146+05:30

%A Priyam Jain

%A Satyaranjan Patra

%A Pankaj Richhariya

%T Enhance Performance of Mapreduce Job on Hadoop Framework using Setup and Cleanup

%J International Journal of Computer Applications

%@ 0975-8887

%V 155

%N 8

%P 36-40

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

MapReduce is an effective programming model for large-scale data-intensive computing applications. Hadoop is an open-source implementation of MapReduce which has been widely used. The communication overhead from the big data sets’ transmission affects the performance of Hadoop greatly. In consideration of data locality, Hadoop schedules tasks to the nodes near the data locations preferentially to decrease data transmission overhead, which works well in homogeneous and dedicated MapReduce environments. However, due to practical considerations about cost and resource utilization, it is common to maintain heterogeneous clusters or share resources by multiple users. Unfortunately, it’s difficult to take advantage of data locality in these heterogeneous or shared environments [1]. To improve the performance of MapReduce in heterogeneous or shared environments, a data prefetching mechanism is proposed, In this paper, we can fetch the data to corresponding compute nodes in advance. It is proved that the proposal of this paper reduces data transmission overhead effectively with theoretical analysis. We also work on applying similar prefetching mechanisms to other phases in MapReduce, and researching on predicting the execution nodes of tasks in cluster computing to improve performance and the result are clearly shows that proposed system will takes a less execution time as compared to existing mapreduce job.

References

Swathi Prabhu, Anisha P Rodrigues, Guru Prasad M S & Nagesh H R, “Performance Enhancement of Hadoop MapReduce Framework for Analyzing BigData”, IEEE 2015, 978-1-4799-608S-9/1S
Hadoop Wiki Website, Apache, http://wiki.apache.org/hadoop
Improving MapReduce Performance Using Data Prefetching mechanism in heterogeneous or Shared Environments Tao gu,Chuang Zuo,Qun Liao , Yulu Yang and Tao Li, International Journal of grid and distributed computing (2013).
“Improve the MapReduce Performance through complexity and performance based on data placement in Heterogeneous Hadoop Cluster ” Rajashekhar M. Arasanal, Daanish U. Rumani Department of Computer Science University of Illinois at Urbana-Champaign.
J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), (2010) April 19-23: Arlanta, USA.
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization, Zhenhua Guo, Geoffrey Fox IEEE (2012)
Improving MapReduce Performance Using Smart Speculative Execution Strategy Qi Chen, Cheng Liu, and Zhen Xiao, Senior Member, IEEE 0018-9340/13/$26.00 © 2013 IEEE
S. Khalil, S. A. Salem, S. Nassar and E. M. Saad, “Mapreduce Performance in Heterogeneous Environments: A Review”, International Journal of Scientific & Engineering Research, vol. 4, no. 4, (2013).
S. Seo, I. Jang, K. Woo, I. Kim, J. S. Kim and S. Maeng, “HPMR: Prefetching and Pre-shuffling in Shared MapReduce Computation Environment”, IEEE International Conference on Cluster Computing and Workshops, (2009) August 31-September 4: New Orleans, USA.
S. Khalil, S. A. Salem, S. Nassar and E. M. Saad, “Mapreduce Performance in Heterogeneous Environments: A Review”, International Journal of Scientific & Engineering Research, vol. 4, no. 4, (2013).
J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), (2010) April 19-23: Arlanta, USA.
Z. Tang, J. Q. Zhou, K. L. Li and R. X. Li, “MTSD: A task scheduling algorithm for MapReduce base on deadline constraints”, IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW), (2012) May 21-25: Shanghai, China.
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker and I. Stoica, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”, Proceedings of the 5th European conference on Computer systems, (2010) April 13-16: Paris, France.
X. Zhang, Z. Zhong, S. Feng and B. Tu, “Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments”, IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), (2011) May 26-28: Busan, Korea.
C. Abad, Y. Lu and R. Campbell, “DARE: Adaptive Data Replication for Efficient Cluster Scheduling”, IEEE International Conference on Cluster Computing (CLUSTER), (2011) September 26-30: Austin, USA.

Index Terms

Computer Science

Information Sciences

Keywords

Big data Hadoop Mapreduce performance prefetching mechanism setup & cleanup