Call for Paper - July 2022 Edition
IJCA solicits original research papers for the July 2022 Edition. Last date of manuscript submission is June 20, 2022. Read More

Energy Efficient and Reliable Job Submission in Hadoop Clusters

Print
PDF
IJCA Special Issue on Information Processing and Remote Computing
© 2012 by IJCA Journal
IPRC - Number 1
Year of Publication: 2012
Authors:
G Sudha Sadasivam
S Sangeetha
R Radhakrishnan

Sudha G Sadasivam, S Sangeetha and R Radhakrishnan. Article: Energy Efficient and Reliable Job Submission in Hadoop Clusters. IJCA Special Issue on Information Processing and Remote Computing IPRC(1):6-11, August 2012. Full text available. BibTeX

@article{key:article,
	author = {G Sudha Sadasivam and S Sangeetha and R Radhakrishnan},
	title = {Article: Energy Efficient and Reliable Job Submission in Hadoop Clusters},
	journal = {IJCA Special Issue on Information Processing and Remote Computing},
	year = {2012},
	volume = {IPRC},
	number = {1},
	pages = {6-11},
	month = {August},
	note = {Full text available}
}

Abstract

MapReduce paradigm is highly suitable for large scale data intensive applications in the cloud environment. The scale of these applications necessitates minimization of cluster power consumption to reduce operational costs and carbon footprint. Energy consumption can be reduced by selective power down of nodes during periods of low utilization. Hadoop is basically used for batch processing of huge jobs. Before jobs are submitted, the files used them are uploaded into the cluster. A file is split up into a number of chunks and distributed across the Hadoop cluster. This paper addresses the problem of block allocation in distributed file system to improve reliability and energy efficiency. A framework to reduce power requirements of a cluster by identifying the number of replicas and their placement for reliable completion of the job has been designed. This will address the issues like block allocation, reliable job submission and minimization of cluster nodes to reduce power consumption. This framework is integrated with hadoop's namenode. The scheduler component in Hadoop has also been modified to enable submission of jobs to active data node containing data to be operated on. A greedy approach and an evolutionary approach using Particle Swarm Optimization (PSO) has been designed to identify suitable nodes to be activated in a cluster. Experimental results demonstrate the performance of these approaches.

References

  • Hadoop- http://developer. yahoo. com – Hadoop internals and tutorial.
  • HDFS - http://hadoop. apache. org/hdfs – Study about HDFS and its development.
  • R. Buyya, M. Murshed, Gridsim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing, Concurrency and Computation: Practice and Experience 14 (2002), 1175{1220. doi:http://dx. doi. org/10. 1002/cpe. 710.
  • Willis Lang and Jignesh M. Patel, Energy Management for MapReduce Clusters,Computer Sciences Department, University of WisconsinMadison,USA.
  • Nitesh Maheshwari, Radheshyam Nanduri, Vasudeva Varma, Dynamic Energy Efficient Data Placement and Cluster Reconfiguration Algorithm for MapReduce Framework, Search and Information Extraction Lab, Language Technologies Research Centre (LTRC), IIIT Hyderabad.
  • Jacob Leverich, Christos Kozyrakis, On the Energy (In)efficiency of Hadoop Clusters, Computer Systems Laboratory, Stanford University.
  • Yanpei Chen, Laura Keys, Randy Katz , Hadoop Summit 2009 – Towards Energy Efficient Hadoop -, RAD Lab, UC Berkeley.
  • Hyeong S. Kim Dong In Shin Young Jin Yu Hyeonsang Eom Heon Y. Yeom,, Towards Energy Proportional Cloud for Data Processing Frameworks, School of Computer Science and Engineering, Seoul National University.
  • M. Weiser, B. Welch, A. Demers, S. Shenker, Scheduling for reducedcpu energy, in: OSDI '94: Proceedings of the 1st USENIX conferenconOperating Systems Design and Implementation, USENIX Association,Berkeley, CA, USA, 1994, p. 2.
  • A. Rangasamy, R. Nagpal, Y. Srikant, Compiler-directed frequencyand voltage scaling for a multiple clock domain microarchitecture, in: CF '08: Proceedings of the 5th conference on Computing frontiers, ACM, New York, NY,USA, 2008, pp. 209{218. doi:http://doi. acm. org/10. 1145/1366230. 1366267
  • A. R. Lebeck, X. Fan, H. Zeng, C. Ellis, Power aware page allocation, in: ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, ACM,NewYork,NY,USA,2000,pp. 105{116. doi:http://doi. acm. org/10. 1145/378993. 379007.
  • D. P. Helmbold, D. D. Long, T. L. Sconyers, B. Sherrod, Adaptive diskspindown for mobile computers, Mobile Networks and Applications 5(2000) 285{297.
  • M. Elnozahy, M. Kistler, R. Rajamony, Energy conservation policiesfor web servers, in: USITS'03: Proceedings of the 4th conference onUSENIX Symposium on Internet Technologies and Systems, USENIXAssociation, Berkeley, CA, USA, 2003.
  • E. V. Carrera, E. Pinheiro, R. Bianchini, Conserving disk energy in net-work servers, in: ICS '03: Proceedings of the 17th annual internationalconference on Supercomputing, ACM, New York, NY, USA, 2003, pp. 86{97. doi:http://doi. acm. org/10. 1145/782814. 782829.
  • S. Gurumurthi, A. Sivasubramaniam, M. Kandemir, H. Franke, Drpm:Dynamic speed control for power management in server class disks,Computer Architecture, International Symposium on 0 (2003) 169. doi:http://doi. ieeecomputersociety. org/10. 1109/ISCA. 2003. 1206998.
  • Jan Stoess , Christoph Klee , Stefan Domthera , Frank Bellosa, Transparent, Power-Aware Migration in Virtualized Systems.
  • Akshat Verma, Puneet Ahuja and Anindya Neogi, pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems
  • Live Data Center Migration acrossWANs:A Robust Cooperative Context Aware ApproachK. K. Ramakrishnan, Prashant Shenoy , Jacobus Van der MerweAT&T Labs-Research / ?? University of Massachusetts
  • intelligence R. Jeyarani , R. Vasanth Ram , N. Nagaveni, Design and implementation of adaptive power-aware virtual machine provisioner (APA-VMP) using swarm
  • Power-aware linear programming based scheduling for heterogeneous computer clusters. Rini T Kaushik, Milind Bhandarkar, GreenHDFS: Towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster.
  • WOL – http://wikipedia. org/wol.
  • PSO reference - http://www. swarmintelligence. org – Study about PSO.