Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Improving Current Hadoop MapReduce Workflow and Performance

Print
PDF
International Journal of Computer Applications
© 2015 by IJCA Journal
Volume 116 - Number 15
Year of Publication: 2015
Authors:
Hamoud Alshammari
Jeongkyu Lee
Hassan Bajwa
10.5120/20414-2828

Hamoud Alshammari, Jeongkyu Lee and Hassan Bajwa. Article: Improving Current Hadoop MapReduce Workflow and Performance. International Journal of Computer Applications 116(15):38-42, April 2015. Full text available. BibTeX

@article{key:article,
	author = {Hamoud Alshammari and Jeongkyu Lee and Hassan Bajwa},
	title = {Article: Improving Current Hadoop MapReduce Workflow and Performance},
	journal = {International Journal of Computer Applications},
	year = {2015},
	volume = {116},
	number = {15},
	pages = {38-42},
	month = {April},
	note = {Full text available}
}

Abstract

This study proposes an improvement andimplementation of enhanced Hadoop MapReduce workflow that develop the performance of the current Hadoop MapReduce. This architecture speeds up the process of manipulating BigData by enhancing different parameters in the processing jobs. BigData needs to be divided into many datasets or blocks and distributed to many nodes within the cluster. Thus, tasks can access these blocks in parallel mode and be processed easily. However, accessing the same datasets each time the job is executed causes data overloading problem, so we developed the current MapReduce workflow to improve the performance in terms of data size that is read in the relative jobs. This work uses a bioinformatics DNA datasets to implement the solution.

References

  • S. Lohr, "The age of big data," New York Times, vol. 11, 2012.
  • V. Marx, "Biology: The big challenges of big data," Nature, vol. 498, pp. 255-260, 06/13/print 2013.
  • T. White, Hadoop: The definitive guide: " O'Reilly Media, Inc. ", 2012.
  • J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Polyzotis, et al. , "SciHadoop: Array-based query processing in Hadoop," in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for, 2011, pp. 1-11.
  • A. B. Patel, M. Birla, and U. Nair, "Addressing big data problem using Hadoop and Map Reduce," in Engineering (NUiCONE), 2012 Nirma University International Conference on, 2012, pp. 1-5.
  • W. Xu, W. Luo, and N. Woodward, "Analysis and optimization of data import with hadoop," pp. 1058-1066.
  • S. Wu, F. Li, S. Mehrotra, and B. C. Ooi, "Query optimization for massively parallel data processing," in Proceedings of the 2nd ACM Symposium on Cloud Computing, 2011, p. 12.
  • L. D. Stein, "The case for cloud computing in genome informatics," Genome Biol, vol. 11, p. 207, 2010.
  • M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud computing and the DNA data race," Nature biotechnology, vol. 28, p. 691, 2010.
  • P. C. Church, A. Goscinski, K. Holt, M. Inouye, A. Ghoting, K. Makarychev, et al. , "Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers," in Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE, 2011, pp. 924-927.
  • H. Alshammari, H. Bajwa, and J. Lee, "Hadoop Based Enhanced Cloud Architecture," presented at the ASEE, USA, 2014.
  • S. Leo, F. Santoni, and G. Zanetti, "Biodoop: Bioinformatics on Hadoop, Parallel Processing Workshops, International Conference on, pp. 415-422, 2009 International Conference on Parallel Processing Workshops, 2009," 2009.
  • A. H. Zookeeper, "http://hadoop. apache. org/zookeeper/," accessed Feb 2015.
  • A. Matsunaga, M. Tsugawa, and J. Fortes, "CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications," in eScience, 2008. eScience '08. IEEE Fourth International Conference on, 2008, pp. 222-229. 9.