CFP last date
20 May 2024
Reseach Article

Improved Fault Tolerant Job Scheduler for Optimal Resource Utilization in Computational Grid

by P. Latchoumy, P. Sheik Abdul Khader
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 48 - Number 22
Year of Publication: 2012
Authors: P. Latchoumy, P. Sheik Abdul Khader
10.5120/7510-0552

P. Latchoumy, P. Sheik Abdul Khader . Improved Fault Tolerant Job Scheduler for Optimal Resource Utilization in Computational Grid. International Journal of Computer Applications. 48, 22 ( June 2012), 6-12. DOI=10.5120/7510-0552

@article{ 10.5120/7510-0552,
author = { P. Latchoumy, P. Sheik Abdul Khader },
title = { Improved Fault Tolerant Job Scheduler for Optimal Resource Utilization in Computational Grid },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 48 },
number = { 22 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 6-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume48/number22/7510-0552/ },
doi = { 10.5120/7510-0552 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:44:44.515693+05:30
%A P. Latchoumy
%A P. Sheik Abdul Khader
%T Improved Fault Tolerant Job Scheduler for Optimal Resource Utilization in Computational Grid
%J International Journal of Computer Applications
%@ 0975-8887
%V 48
%N 22
%P 6-12
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Grid computing provides the ability to access, utilize and control a variety of underutilized heterogeneous resources distributed across multiple administrative domains while it is an error prone environment. The failure of resources affects job execution during runtime. We propose a new strategy named Improved Fault Tolerant Job Scheduler (IFTJS) for Optimal Resource Utilization in Computational Grid which effectively schedules grid jobs tolerating faults gracefully and executes more jobs successfully within the specified deadline. This system maintains the history of fault occurrence of resources with respect to Processor, Memory and Bandwidth. The usage of this information causes the reduction of selecting chances of the resources which have more failure probability and hence improves the resource utilization. Also, the system guarantees the efficient job execution using Reduced Recovery Time (RRT) strategy. Whenever the scheduler has jobs to schedule, the Improved Fault Tolerant (IFT) algorithm finds the optimal resources based on their failure rate. The resources with lowest failure rate will have highest priority for scheduling. The job manager can monitor the execution of job and return the results to the user after successful completion. If failure occurs it re-executes the job with the same resource using the last saved state when the Failure Rate of the resource is lesser than the optimal value or with the backup resources when it exceeds an optimal value with the last saved state using RRT strategy. Otherwise it reschedules the failed job with the next available optimal resource using the last saved state. Hence the recovery time is getting reduced. Approach is effective in the sense that the resource manager detects the occurrence of resource failures and the job manager guarantees that the submitted jobs executed with optimal resources with the specified deadline.

References
  1. Foster, C. Kesselman, and S. Tueke 2001 The anatomy of the grid: Enabling scalable virtual organizations Supercomputing Applications.
  2. P. Latchoumy, P. Sheik Abdul Khader 2011 Survey on Fault Tolerance in Grid Computing International Journal of Computer Science & Engineering Survey(IJCSES) Vol. 2, No. 4.
  3. Huda MT, Schmidt HW, Peake ID 2005 An agent oriented proactive fault tolerant framework for grid computing In: First international conference on e-science and grid computing.
  4. Leili Mohammad Khanli, Maryam Etminan Far, Amir Masoud Rahmani 2010 RFOH: A New Fault Tolerant Job Scheduler in Grid Computing.
  5. Amoon. M. dept. of comput. sci. , king saud univ. , riyadh, saudi arabia 2011 Design of a fault-tolerant scheduling system for grid computing in networking and distributed computing (icndc) second international conference .
  6. Babar Nazir , Kalim Qureshi, Paul Manuel 2008 Adaptive checkpointing strategy to tolerate faults in economy based grid ©Springer Science+Business Media.
  7. P. Latchoumy, P. Sheik Abdul Khader 2012 Fault Tolerant Scheduler with Reduced Checkpointing Time in Grid Computing in National Conference on Information Technology-NCIT.
  8. Dasgupta, G. ; Ezenwoye, O. ; Liana Fong; Kalayci, S. ; Sadjadi, S. M. ; Viswanathan, B. 2008 Runtime Fault-Handling for Job-Flow Management in Grid Environments In International Conference on Autonomic Computing.
  9. S. Baghavathi Priya, M. Prakash, Dr. K. K. Dhwan 2007 Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point The Sixth International Conference on Grid and Cooperative Computing (GCC).
  10. Imran, M. ; Niaz, I. A. ; Haider, S. ; Hussain, N. ; Ansari, M. A. 2007 Towards Optimal Fault Tolerant Scheduling in Computational Grid In. Emerging Technologies (ICET).
  11. Li Y, Lan Z 2006 Exploit failure prediction for adaptive fault tolerance in cluster. In: Proceedings of the sixth IEEE international symposium on cluster computing and the grid (CCGRID'06), ISBN 0-7695-2585-7, vol1.
  12. Malarvizhi Nandagopal and Rhymend Uthariaraj 2011 Performance Analysis of Resource Selection Algorithms in Grid Computing Environment Journal of Computer Science 79(4):493-498.
Index Terms

Computer Science
Information Sciences

Keywords

Improved Fault-tolerant Job Scheduler (iftjs) Failure Rate Checkpointing Time Reduced Recovery Time (rrt) Optimal Resources Job Manager Resource Manager Utilization Rate