Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing

Antony Lidya Therasa.S; Sumathi.G; Antony Dalya.S

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Issues in developing LVCSR System for Dravidian Languages: An Exhaustive Case Study for Tamil

May

2013

A Novel Approach to Scene Classification using K-Means Clustering

September

2015

Introducing High Availability in Threads based Grid Middleware Architecture

May

2014

Review of Error Rate and Computation Time of Clustering Algorithms on Social Networking Sites

March

2015

Reseach Article

Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing

by Antony Lidya Therasa.S, Sumathi.G, Antony Dalya.S

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 2 - Number 3

Year of Publication: 2010

Authors: Antony Lidya Therasa.S, Sumathi.G, Antony Dalya.S

10.5120/636-891

Antony Lidya Therasa.S, Sumathi.G, Antony Dalya.S . Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing. International Journal of Computer Applications. 2, 3 ( May 2010), 95-99. DOI=10.5120/636-891

@article{ 10.5120/636-891,

author = { Antony Lidya Therasa.S, Sumathi.G, Antony Dalya.S },

title = { Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing },

journal = { International Journal of Computer Applications },

issue_date = { May 2010 },

volume = { 2 },

number = { 3 },

month = { May },

year = { 2010 },

issn = { 0975-8887 },

pages = { 95-99 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume2/number3/636-891/ },

doi = { 10.5120/636-891 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T19:50:00.261520+05:30

%A Antony Lidya Therasa.S

%A Sumathi.G

%A Antony Dalya.S

%T Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing

%J International Journal of Computer Applications

%@ 0975-8887

%V 2

%N 3

%P 95-99

%D 2010

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Grid is a form distributed computing mainly to virtualilze and utilize geographically distributed idle resources. A grid is a distributed computational and storage environment often composed of heterogeneous autonomously managed subsystems. As a result varying resource availability becomes common place, often resulting in loss and delay of executing jobs. To ensure good performance fault tolerance should be taken into account. Here we address the fault tolerance in terms of resource failure. Commonly utilized techniques to achieve fault tolerance is periodic checkpointing, which periodically saves the jobs state. But an inappropriate checkpointing interval leads to delay in the job execution, and reduces the throughput. Hence in the proposed work, the strategy used to achieve fault tolerance is by dynamically adapting the checkpoints based on current status and history of failure information of the resource, which is maintained in the Information server. The Last failure time and Mean failure time based algorithm dynamically modifies the frequency of checkpoint interval, hence increases the throughput by reducing the unnecessary checkpoint overhead. In case of resource failure, the proposed Fault Index Based Rescheduling (FIBR) algorithm reschedules the job from the failed resource to some other available resource with the least Fault-index value and executes the job from the last saved checkpoint. This ensures the job to be executed within the deadline with increased throughput and helps in making the grid environment trust worthy.

References

Chtepen, M.; Claeys, F.H.A.; Dhoedt, B.; De Turck, F.; Demeester, P.; Vanrolleghem, P.A. Adaptive Task CHECKPOINTING and Replication: Toward Efficient Fault-Tolerant Grids Parallel and Distributed Systems, IEEE Transactions on Volume 20, Issue 2, Feb. 2009 Page(s):180 – 190 Digital Object Identifier 10.1109/TPDS.2008.93
Daniel Nurmi, Rich Wolski, Chris Grzegorczyk, Graziano Obertelli, Sunil Soman, Lamia Youseff, Dmitrii Zagorodnov, Eucalyptus: A Technical Report on an Elastic Utility Computing architecture Linking Your Programs To Useful Systems.UCSB computer science technical report number 2008-2010
Favarim, F.; da Silva Fraga, J.; Lung Lau Cheuk; Correia, M. .GRIDTS: A New Approach for Fault- Tolerant Scheduling in Grid Computing Network Computing and Applications, 2007. NCA 2007. Sixth IEEE International Symposium on Volume ,Issue,12-14 July 2007 Page(s):187–194 Digital ObjectIdentifier 10.1109/NCA.2007.27
Fangpeng Dong and Selim G. Akl January 2006 Scheduling Algorithms for Grid Computing:State of the Art and Open Problems. Technical Report No. 2006-504 School of Computing, Queen’s University Kingston, Ontario
Foster,I.; Yong Zhao; Raicu,I.; Lu,S; Grid computing and Grid computing 360-degree compared. Grid computing environments workshop,2008.GCE’08 12-16 Nov.2008 pages:1-10.
Lars-Olof Burchard, C´esar A. F. De Rose, Hans Ulrich Heiss, Barry Linnert and J¨org Schneider. VRM: A Failure-Aware Grid Resource Management System. Proc. of the 17th Intl: Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’05). IEEE. 2005
Mohammad Tanvir Huda, Heinz W. Schmidt and Ian D. Peake. An Agent Oriented Proactive Fault tolerant Framework for Grid Computing. First International Conference on e-Science and Grid Computing (e-Science’05).IEEE. 2005
R. Medeiros, W. Cirne, F. Brasileiro and J. Sauve, .Faults in Grids: Why are they so bad and What can be done abut it? in the proceedings of the Fourth Intl: Workshop on Grid Computing (GRID'03), 2003.
Nazir, B.; Khan, T.Fault Tolerant Job Scheduling in Computational Grid. Emerging Technologies, 2006. ICET apos;06. International Conference on Volume , Issue, 13-14 Nov.2006 Page(s):708–713 Digital Object Identifier 10.1109/ICET.2006.335930
D. Feitelson, Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/workload/, 2008
Jang-uk In,Paul Avery, Richard Cavanaugh. SPHINIX:A fault tolerant system for scheduling in dynamic environments,proceedings of the 19th IEEE international parallel and distributed processing symposim.
www.gridbus.org/gridsim/
grid simulator.http://www.buyya.com/gridbus/gridsim/, released on Apr 08, 2009
S. Agarwal, R. Garg, M. Gupta, and J. Moreira, “Adaptive Incremental Checkpointing for Massively Parallel Systems,” Proc.18th Ann. Int’l Conf. Supercomputing (SC ’04), Nov. 2004.
A. Subbiah and D. Blough, “Distributed Diagnosis in Dynamic Fault Environments,” Parallel and Distributed Systems, vol. 15, no. 5,pp. 453-467, 2004.

Index Terms

Computer Science

Information Sciences

Keywords

Grid Computing Fault-Tolerance Checkpointing