A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

S. Veerapandi; S. Gavaskar; A. Sumithra

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

Explainable Hybrid Deep Learning for Automated Diagnosis of Canine Mammary Tumors

Elham Shawky Salama Heba Askr Ashraf Darwish Aboul Ella Hassanien

Random Articles

Reseach Article

A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

by S. Veerapandi, S. Gavaskar, A. Sumithra

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 157 - Number 1

Year of Publication: 2017

Authors: S. Veerapandi, S. Gavaskar, A. Sumithra

10.5120/ijca2017912614

S. Veerapandi, S. Gavaskar, A. Sumithra . A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication. International Journal of Computer Applications. 157, 1 ( Jan 2017), 43-48. DOI=10.5120/ijca2017912614

@article{ 10.5120/ijca2017912614,

author = { S. Veerapandi, S. Gavaskar, A. Sumithra },

title = { A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication },

journal = { International Journal of Computer Applications },

issue_date = { Jan 2017 },

volume = { 157 },

number = { 1 },

month = { Jan },

year = { 2017 },

issn = { 0975-8887 },

pages = { 43-48 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume157/number1/26799-2017912614/ },

doi = { 10.5120/ijca2017912614 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:02:48.653717+05:30

%A S. Veerapandi

%A S. Gavaskar

%A A. Sumithra

%T A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

%J International Journal of Computer Applications

%@ 0975-8887

%V 157

%N 1

%P 43-48

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Managing the distributed environment against the failures plays an important role nowadays. There are so many techniques evolved so far and each have their own merit and demerit. The efficiency of the algorithm depends on how much replication is done and upto what extent the fault tolerance has been achieved. We have here proposed a new method which uses both check point as well as the replication to ensure consistency in the distributed environment. Our method is also easy to implement.

References

M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, G. Alonso,“ Understanding Replication in Databases and Distributed Systems,” Research supported by EPFLETHZ DRAGON project and OFES).
M. Herlihy and J. Wing. “Linearizability: a correctness condition for concurrent objects,” ACM Trans. on Progr. Languages and Syst., 12(3):463-492, 1990. (IJIDCS) International Journal on Internet and Distributed Computing Systems. Vol: 1 No: 1, 39
M. Ahamad, P.W. Hutto, G. Neiger, J.E. Burns, and P. Kohli., “Causal Memory:Definitions, implementations and Programming,” TR GIT-CC-93/55, Georgia In-stitute of Technology, July 94.
H.P. Reiser, M.J. Danel, and F.J. Hauck., “ A flexible replication framework for scalable andreliable .net services.,” In Proc. of the IADIS Int. Conf. on Applied Computing, volume1, pages 161–169, 2005.
A. Kale, U. Bharambe, “Highly available fault tolerant distributed computing using reflection and replication,” Proceedings of the International Conference on Advances in Computing, Communication and Control ,Mumbai, India Pages: 251-256 ,: 2009
X. China, “Token-Based Sequential Consistency in Asynchronous Distributed System ,” 17 th Internaional Conference on Advanced Information Networking and Applications (AINA'03),March 27-29, ISBN: 0-7695- 1906-7
A. Shye, , J. Blomstedt, , T. Moseley,V. Reddi, , and Daniel A. Connors, “PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures” Pp135-148.
V. Agarwal, Fault Tolerance in Distributed Systems, I. Institute of Technology Kanpur, www.cse.iitk.ac.in/report-repository, 2004. ,
H. Jung, D. Shin, H. Kim, and Heon Y. Lee, “Design and Implementation of Multiple FaultTolerant MPI over Myrinet (M3) ,” SC|05 Nov 1218,2005, Seattle, Washington, USA Copyright 2005 ACM.
M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-81, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, October 1996.
L. Alvisi and K. Marzullo. Message logging : Pessimistic, optimistic, and causal. In Proceedings of the 15th International Conference on Distributed Computing,Systems (ICDCS 1995), pages ,229–236. IEEE CS Press, May-June 1995.
J. Walters and V. Chaudhary,” Replication-Based Fault Tolerance for MPI Applications,” Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 7, July 2009
M Chtepen, F.. Claeys, B. Dhoedt, , and P. Vanrolleghem,” Adaptive Task Checkpointing and Replication:Toward Efficient Fault-Tolerant Grids”, IEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 2, Feb 2009
S. Jafar, A. Krings, and T. Gautier,” Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing”, IEEE Transactions On Dependable and Secure Computing, Vol. 6, No. 1, Jan-Mar 2009
X. Yang, Y. Du, Panfeng W. Fu, and Jia “FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing,” Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 10, October 2009
S. Gorender, and M Raynal, “An Adaptive Programming Model for Fault-Tolerant Distributed Computing” Ieee Transactions On Dependable And Secure Computing, Vol. 4, No. 1, January-March 2007.
A. Luckow B. Schnor, „“Adaptive Checkpoint Replication for Supporting the Fault Tolerance of Applications in the Grid,“ Seventh IEEE International Symposium on Network Computing and Applications, 2008 IEEE.
A. Bouteiller, F. Cappello, T. H Krawezik, Pi Lemarinier, F Magniette, “MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging, ” SC’03, NoV 15-21, 2003, Phoenix, Arizona, USA Copyright 2003 ACM 1-58113-695- 1/03/001
I. Saha, D. Mukhopadhyay and S. Banerjee, “Designing Reliable Architecture For Stateful Fault Tolerance,” Proceedings of the Seventh International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT'06) 2006 .
N. Gorde, S. Aggarwal, “A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grid” International Conference on Parallel Processing Workshops, 2008 IEEE
Y. Li, , Z. Lan, , P. Gujrati and , X. Sun, , “Fault- AwareRuntime Strategies for High-Performance Computing,” IEEE Transactions on Parallel And Distributed Systems, Vol. 20, No. 4, April 2009
G. Jakadeesan, D. Goswami, “A Classification-Based Approach to Fault-Tolerance Support in Parallel Programs”, International Conference on Parallel and Distributed Computing, Applications and Technologies, 2009 IEEE.
D.K. Gifford, “Weighted voting for replicated data,” In SOSP ’79: Proc. of the seventh ACM symposium on Operating systems principles, pages 150–162, 1979.
J. Osrael, L. Froihofer, K.M. Goeschka, S. Beyer,P. Gald´amez, , and F. Mu˜noz. “A system architecture for enhanced availability of tightly coupled distributed systems,” In Proc. of 1st Int. Conf. on Availability, Reliability, and Security.IEEE, 2006
J Maccormick1, C Thekkath, M.Jager,K. Roomp, and L. Peterson , “Niobe: A Practical Replication Protocol.” ACM Journal Name, Vol. V, No. N, Month 20YY.
Cao Huaihu, Zhu Jianming, “An Adaptive Replicas Creation Algorithm with Fault Tolerance in the Distributed Storage Network” 2008 IEEE..
N. Budhiraja, K. Marzullo, F.B. Schneider, and S. Toueg. The Primary-Backup Approach. In Sape Mullender, editor, Distributed Systems, pages 199-216. ACM Press, 1993.
V.K Garg,. “Implementing fault-tolerant services using fused state machines,” Tech-nical Report ECE-PDS-2010- 001, Parallel and Distributed Systems Laboratory,ECE Dept. University of Texas at Austin (2010).
N. Xiong, M. Cao, J. He and L. Shu, “A Survey on Faulttolerance in Distributed Network Systems,” 2009 International Conference on Computational Science, 978- 0-7695-3823-5/09
D. Tian , K. Wu X. Li, “A Novel Adaptive Failure Detector for Distributed Systems,” Proceedings of the 2008 International Conference on Networking, Architecture, and Storage ©2008 , ISBN: 978-0-7695- 3187-8

Index Terms

Computer Science

Information Sciences

Keywords

FTPA PLR GiFT