Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
S. Veerapandi, S. Gavaskar, A. Sumithra

S Veerapandi, S Gavaskar and A Sumithra. A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication. International Journal of Computer Applications 157(1):43-48, January 2017. BibTeX

	author = {S. Veerapandi and S. Gavaskar and A. Sumithra},
	title = {A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication},
	journal = {International Journal of Computer Applications},
	issue_date = {January 2017},
	volume = {157},
	number = {1},
	month = {Jan},
	year = {2017},
	issn = {0975-8887},
	pages = {43-48},
	numpages = {6},
	url = {},
	doi = {10.5120/ijca2017912614},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Managing the distributed environment against the failures plays an important role nowadays. There are so many techniques evolved so far and each have their own merit and demerit. The efficiency of the algorithm depends on how much replication is done and upto what extent the fault tolerance has been achieved. We have here proposed a new method which uses both check point as well as the replication to ensure consistency in the distributed environment. Our method is also easy to implement.


  1. M. Wiesmann, F. Pedone, A. Schiper, B. Kemme, G. Alonso,“ Understanding Replication in Databases and Distributed Systems,” Research supported by EPFLETHZ DRAGON project and OFES).
  2. M. Herlihy and J. Wing. “Linearizability: a correctness condition for concurrent objects,” ACM Trans. on Progr. Languages and Syst., 12(3):463-492, 1990. (IJIDCS) International Journal on Internet and Distributed Computing Systems. Vol: 1 No: 1, 39
  3. M. Ahamad, P.W. Hutto, G. Neiger, J.E. Burns, and P. Kohli., “Causal Memory:Definitions, implementations and Programming,” TR GIT-CC-93/55, Georgia In-stitute of Technology, July 94.
  4. H.P. Reiser, M.J. Danel, and F.J. Hauck., “ A flexible replication framework for scalable andreliable .net services.,” In Proc. of the IADIS Int. Conf. on Applied Computing, volume1, pages 161–169, 2005.
  5. A. Kale, U. Bharambe, “Highly available fault tolerant distributed computing using reflection and replication,” Proceedings of the International Conference on Advances in Computing, Communication and Control ,Mumbai, India Pages: 251-256 ,: 2009
  6. X. China, “Token-Based Sequential Consistency in Asynchronous Distributed System ,” 17 th Internaional Conference on Advanced Information Networking and Applications (AINA'03),March 27-29, ISBN: 0-7695- 1906-7
  7. A. Shye, , J. Blomstedt, , T. Moseley,V. Reddi, , and Daniel A. Connors, “PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures” Pp135-148.
  8. V. Agarwal, Fault Tolerance in Distributed Systems, I. Institute of Technology Kanpur,, 2004. ,
  9. H. Jung, D. Shin, H. Kim, and Heon Y. Lee, “Design and Implementation of Multiple FaultTolerant MPI over Myrinet (M3) ,” SC|05 Nov 1218,2005, Seattle, Washington, USA Copyright 2005 ACM.
  10. M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message passing systems. Technical Report CMU-CS-96-81, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, October 1996.
  11. L. Alvisi and K. Marzullo. Message logging : Pessimistic, optimistic, and causal. In Proceedings of the 15th International Conference on Distributed Computing,Systems (ICDCS 1995), pages ,229–236. IEEE CS Press, May-June 1995.
  12. J. Walters and V. Chaudhary,” Replication-Based Fault Tolerance for MPI Applications,” Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 7, July 2009
  13. M Chtepen, F.. Claeys, B. Dhoedt, , and P. Vanrolleghem,” Adaptive Task Checkpointing and Replication:Toward Efficient Fault-Tolerant Grids”, IEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 2, Feb 2009
  14. S. Jafar, A. Krings, and T. Gautier,” Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing”, IEEE Transactions On Dependable and Secure Computing, Vol. 6, No. 1, Jan-Mar 2009
  15. X. Yang, Y. Du, Panfeng W. Fu, and Jia “FTPA: Supporting Fault-Tolerant Parallel Computing through Parallel Recomputing,” Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 10, October 2009
  16. S. Gorender, and M Raynal, “An Adaptive Programming Model for Fault-Tolerant Distributed Computing” Ieee Transactions On Dependable And Secure Computing, Vol. 4, No. 1, January-March 2007.
  17. A. Luckow B. Schnor, „“Adaptive Checkpoint Replication for Supporting the Fault Tolerance of Applications in the Grid,“ Seventh IEEE International Symposium on Network Computing and Applications, 2008 IEEE.
  18. A. Bouteiller, F. Cappello, T. H Krawezik, Pi Lemarinier, F Magniette, “MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging, ” SC’03, NoV 15-21, 2003, Phoenix, Arizona, USA Copyright 2003 ACM 1-58113-695- 1/03/001
  19. I. Saha, D. Mukhopadhyay and S. Banerjee, “Designing Reliable Architecture For Stateful Fault Tolerance,” Proceedings of the Seventh International Conference on Parallel and Distributed Computing,Applications and Technologies (PDCAT'06) 2006 .
  20. N. Gorde, S. Aggarwal, “A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grid” International Conference on Parallel Processing Workshops, 2008 IEEE
  21. Y. Li, , Z. Lan, , P. Gujrati and , X. Sun, , “Fault- AwareRuntime Strategies for High-Performance Computing,” IEEE Transactions on Parallel And Distributed Systems, Vol. 20, No. 4, April 2009
  22. G. Jakadeesan, D. Goswami, “A Classification-Based Approach to Fault-Tolerance Support in Parallel Programs”, International Conference on Parallel and Distributed Computing, Applications and Technologies, 2009 IEEE.
  23. D.K. Gifford, “Weighted voting for replicated data,” In SOSP ’79: Proc. of the seventh ACM symposium on Operating systems principles, pages 150–162, 1979.
  24. J. Osrael, L. Froihofer, K.M. Goeschka, S. Beyer,P. Gald´amez, , and F. Mu˜noz. “A system architecture for enhanced availability of tightly coupled distributed systems,” In Proc. of 1st Int. Conf. on Availability, Reliability, and Security.IEEE, 2006
  25. J Maccormick1, C Thekkath, M.Jager,K. Roomp, and L. Peterson , “Niobe: A Practical Replication Protocol.” ACM Journal Name, Vol. V, No. N, Month 20YY.
  26. Cao Huaihu, Zhu Jianming, “An Adaptive Replicas Creation Algorithm with Fault Tolerance in the Distributed Storage Network” 2008 IEEE..
  27. N. Budhiraja, K. Marzullo, F.B. Schneider, and S. Toueg. The Primary-Backup Approach. In Sape Mullender, editor, Distributed Systems, pages 199-216. ACM Press, 1993.
  28. V.K Garg,. “Implementing fault-tolerant services using fused state machines,” Tech-nical Report ECE-PDS-2010- 001, Parallel and Distributed Systems Laboratory,ECE Dept. University of Texas at Austin (2010).
  29. N. Xiong, M. Cao, J. He and L. Shu, “A Survey on Faulttolerance in Distributed Network Systems,” 2009 International Conference on Computational Science, 978- 0-7695-3823-5/09
  30. D. Tian , K. Wu X. Li, “A Novel Adaptive Failure Detector for Distributed Systems,” Proceedings of the 2008 International Conference on Networking, Architecture, and Storage ©2008 , ISBN: 978-0-7695- 3187-8