Boosting the Performance of MapReduce by Better Resource Utilization in Cluster

Pooja Malikwade; S.B.Jadhav

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts

Jan

2024

IRBBO for Gain Maximization of Fifteen-Element Yagi-Uda Antenna

April

2013

Intrusion Detection in Wireless Networks using FUZZY Neural Networks and Dynamic Context-Aware Role based Access Control Security (DCARBAC)

February

2012

Technique for Template Generation for Simultaneous Testing of Multiple Identical Functional Units in Super-scalar Architecture

November

2011

Reseach Article

Boosting the Performance of MapReduce by Better Resource Utilization in Cluster

by Pooja Malikwade, S.B.Jadhav

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 112 - Number 16

Year of Publication: 2015

Authors: Pooja Malikwade, S.B.Jadhav

10.5120/19753-1535

Pooja Malikwade, S.B.Jadhav . Boosting the Performance of MapReduce by Better Resource Utilization in Cluster. International Journal of Computer Applications. 112, 16 ( February 2015), 29-33. DOI=10.5120/19753-1535

@article{ 10.5120/19753-1535,

author = { Pooja Malikwade, S.B.Jadhav },

title = { Boosting the Performance of MapReduce by Better Resource Utilization in Cluster },

journal = { International Journal of Computer Applications },

issue_date = { February 2015 },

volume = { 112 },

number = { 16 },

month = { February },

year = { 2015 },

issn = { 0975-8887 },

pages = { 29-33 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume112/number16/19753-1535/ },

doi = { 10.5120/19753-1535 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:49:40.737786+05:30

%A Pooja Malikwade

%A S.B.Jadhav

%T Boosting the Performance of MapReduce by Better Resource Utilization in Cluster

%J International Journal of Computer Applications

%@ 0975-8887

%V 112

%N 16

%P 29-33

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

MapReduce implementations are being used for processing large data sets. MapReduce performs parallel computations to speed up the job processing. When performing parallel computations the skew that arises due large indivisible records or uneven distribution of data slows down the job execution process and lowers the cluster throughput. We provide a solution, by proposing an automatic system that handles skew which is compatible with MapReduce framework and is transparent to users. The proposed system makes use of idle resources in the cluster for skew handing. Task repartitioning method is implemented for the purpose of skew handling. The output order is maintained even after task repartitioning. The proposed system requires no extra input from the users and imposes minimum overhead in the absence of skew.

References

J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, pp. 107–113, January 2008.
K. Ren, Y. Kwon, M. Balazinska, and B. Howe, “Hadoops adolescence: A comparative workload analysis from three research clusters,” in Proceedings of IEEE 8th International Conference on e-Business Engineering, ser. ICEBE’2011, 2011.
“Apache hadoop, http://hadoop.apache.org/.”
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc.of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, ser. EuroSys ’07, 2007.
M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving mapreduce performance in heterogeneous environments,” in Proc. of the 8th USENIX conference on Operating systems design and implementation, ser. OSDI’08, 2008.
G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the outliers in map-reduce clusters using mantri,” in Proc. of the 9th USENIX conference on Operating systems design and implementation, ser. OSDI’10, 2010.
Q. Chen, C. Liu, and Z. Xiao, “Improving mapreduce performance using smart speculative execution strategy,” IEEE Transactions on Computers, vol. 99, no. PrePrints, p. 1, 2013.
Z. Guo, M. Pierce, G. Fox, and M. Zhou, “Automatic task re-organization in mapreduce,” in Proceedings of the 2011 IEEE International Conference on Cluster Computing, ser. CLUSTER ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 335–343.
K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the progress of MapReduce pipelines. In Proc. of the 26nd ICDE Conf., Mar. 2010.
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, “Scope: easy and efficient parallel processing of massive data sets,” Proc. VLDB Endow., vol. 1, pp. 1265–1276, August 2008.
X. Pan, J. Tan, S. Kavulya, R. Gandhi, and P. Narasimhan, “Ganesha: blackbox diagnosis of mapreduce systems,” SIGMETRICS Perform. Eval. Rev., vol. 37, pp. 8–13, January 2010.
H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, “Map-reducemerge: simplified relational data processing on large clusters,” in Proc. of the 2007 ACM SIGMOD international conference on Management of data, ser. SIGMOD ’07, 2007.
M. C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363{1369, June 2009.
M. Shah, J. Hellerstein, and E. Brewer. Highly-available, fault-tolerant, parallel dataows. In Proc. of the SIGMOD Conf., June 2004.

Index Terms

Computer Science

Information Sciences

Keywords

Data skew MapReduce parallel database systems performance gain skew handling