Improving Current Hadoop MapReduce Workflow and Performance

Hamoud Alshammari; Jeongkyu Lee; Hassan Bajwa

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

Discovering Relevant Semantic Associations using Relationship Weights

February

2014

A Survey on Business Management Tool in IT Sector for Digitizing Engineering Process Framework

February

2012

Article:Document Clustering based on Topic Maps

December

2010

An Enhanced Online Shopping System using M-Wallet

Jun

2021

Reseach Article

Improving Current Hadoop MapReduce Workflow and Performance

by Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 116 - Number 15

Year of Publication: 2015

Authors: Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa

10.5120/20414-2828

Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa . Improving Current Hadoop MapReduce Workflow and Performance. International Journal of Computer Applications. 116, 15 ( April 2015), 38-42. DOI=10.5120/20414-2828

@article{ 10.5120/20414-2828,

author = { Hamoud Alshammari, Jeongkyu Lee, Hassan Bajwa },

title = { Improving Current Hadoop MapReduce Workflow and Performance },

journal = { International Journal of Computer Applications },

issue_date = { April 2015 },

volume = { 116 },

number = { 15 },

month = { April },

year = { 2015 },

issn = { 0975-8887 },

pages = { 38-42 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume116/number15/20414-2828/ },

doi = { 10.5120/20414-2828 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:57:13.840192+05:30

%A Hamoud Alshammari

%A Jeongkyu Lee

%A Hassan Bajwa

%T Improving Current Hadoop MapReduce Workflow and Performance

%J International Journal of Computer Applications

%@ 0975-8887

%V 116

%N 15

%P 38-42

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that develop the performance of the current Hadoop MapReduce. This architecture speeds up the process of manipulating BigData by enhancing different parameters in the processing jobs. BigData needs to be divided into many datasets or blocks and distributed to many nodes within the cluster. Thus, tasks can access these blocks in parallel mode and be processed easily. However, accessing the same datasets each time the job is executed causes data overloading problem, so we developed the current MapReduce workflow to improve the performance in terms of data size that is read in the relative jobs. This work uses a bioinformatics DNA datasets to implement the solution.

References

S. Lohr, "The age of big data," New York Times, vol. 11, 2012.
V. Marx, "Biology: The big challenges of big data," Nature, vol. 498, pp. 255-260, 06/13/print 2013.
T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc. ", 2012.
J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, et al. , "SciHadoop: Array-based query processing in Hadoop," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1-11.
A. B. Patel, M. Birla, and U. Nair, "Addressing big data problem using Hadoop and Map Reduce," in Engineering (NUiCONE), 2012 Nirma University International Conference on, 2012, pp. 1-5.
W. Xu, W. Luo, and N. Woodward, "Analysis and optimization of data import with hadoop," pp. 1058-1066.
S. Wu, F. Li, S. Mehrotra, and B. C. Ooi, "Query optimization for massively parallel data processing," in Proceedings of the 2nd ACM Symposium on Cloud Computing, 2011, p. 12.
L. D. Stein, "The case for cloud computing in genome informatics," Genome Biol, vol. 11, p. 207, 2010.
M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud computing and the DNA data race," Nature biotechnology, vol. 28, p. 691, 2010.
P. C. Church, A. Goscinski, K. Holt, M. Inouye, A. Ghoting, K. Makarychev, et al. , "Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers," in Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, 2011, pp. 924-927.
H. Alshammari, H. Bajwa, and J. Lee, "Hadoop Based Enhanced Cloud Architecture," presented at the ASEE, USA, 2014.
S. Leo, F. Santoni, and G. Zanetti, "Biodoop: Bioinformatics on Hadoop, Parallel Processing Workshops, International Conference on, pp. 415-422, 2009 International Conference on Parallel Processing Workshops, 2009," 2009.
A. H. Zookeeper, "http://hadoop. apache. org/zookeeper/," accessed Feb 2015.
A. Matsunaga, M. Tsugawa, and J. Fortes, "CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications," in eScience, 2008. eScience '08. IEEE Fourth International Conference on, 2008, pp. 222-229. 9.

Index Terms

Computer Science

Information Sciences

Keywords

Cloud Computing Hadoop bioinformatics BigData.