|
10.5120/710-998 |
Rachit Garg and Praveen Kumar. Article:A Review of Fault Tolerant Checkpointing Protocols for Mobile Computing Systems. International Journal of Computer Applications 3(2):8–19, June 2010. Published By Foundation of Computer Science. BibTeX
@article{key:article,
author = {Rachit Garg and Praveen Kumar},
title = {Article:A Review of Fault Tolerant Checkpointing Protocols for Mobile Computing Systems},
journal = {International Journal of Computer Applications},
year = {2010},
volume = {3},
number = {2},
pages = {8--19},
month = {June},
note = {Published By Foundation of Computer Science}
}
Abstract
A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. A mobile computing system is a distributed system where some of processes are running on mobile hosts (MHs), whose location in the network changes with time. Mobile distributed systems raise new issues such as mobility, low bandwidth of wireless channels, disconnections, limited battery power and lack of reliable stable storage on mobile nodes. This paper addresses the problem of fault tolerant computing in mobile distributed systems. The techniques described are based on checkpointing and roll back recovery.
Reference
-
Acharya A. and Badrinath B. R., “Checkpointing Distributed Applications on Mobile Computers,” Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems, pp. 73-80, September 1994.
Acharya A., “Structuring Distributed Algorithms and Services for networks with Mobile Hosts”, Ph.D. Thesis, Rutgers University, 1995.
Alvisi, Lorenzo and Marzullo, Keith,“ Message Logging: Pessimistic, Optimistic, Causal, and Optimal”, IEEE Transactions on Software Engineering, Vol. 24, No. 2, February 1998, pp. 149-159.
L. Alvisi, Hoppe, B., Marzullo, K., “Nonblocking and Orphan-Free message Logging Protocol,” Proc. of 23rd Fault Tolerant Computing Symp., pp. 145-154, June 1993.
L. Alvisi,“ Understanding the Message Logging Paradigm for Masking Process Crashes,“ Ph.D. Thesis, Cornell Univ., Dept. of Computer Science, Jan. 1996. Available as Technical Report TR-96-1577.
L. Alvisi and K. Marzullo,“ Tradeoffs in implementing Optimal Message Logging Protocol”, Proc. 15th Symp. Principles of Distributed Computing, pp. 58-67, ACM, June, 1996.
Adnan Agbaria, Wiilliam H Sanders,“ Distributed Snapshots for Mobile Computing Systems”, IEEE Intl. Conf. PERCOM’04, pp. 1-10, 2004.
Avi Ziv and Jehoshua Bruck, “ Checkpointing in Parallel and Distributed Systems”, Book Chapter from Parallel and Distributed Computing Handbook edited by Albert Z. H. Zomaya, pp. 274-302, Mc Graw Hill, 1996.
A. Borg, J. Baumbach, and S. Glazer,“ A Message System Supporting Fault Tolerance”, Proc. Symp. Operating System Principles, pp. 90-99, ACM SIG OPS, Oct. 1983.
Adnan Agbaria, William H. Sanders, “ Distributed Snapshots for Mobile Computing Systems”, Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications (Percom’04), pp. 1-10, 2004.
Baldoni R., Hélary J-M., Mostefaoui A. and Raynal M., “ Rollback Dependency Trackability: A Minimial Characterization and its Protocol”, Information and Computation, 165, pp. 144-173, 2003.
Baldoni R., Hélary J-M., Mostefaoui A. and Raynal M., “A Communication- Induced Checkpointing Protocol that Ensures Rollback-Dependency Trackability,” Proceedings of the International Symposium on Fault-Tolerant-Computing Systems, pp. 68-77, June 1997.
Bhagwat P., and Perkins, C.E., “A mobile Networking System based on Internet Protocol (IP)”,USENIX Symposium on Mobile and Location-Independent Computing, August 1993.
Bhargava B. and Lian S. R., “Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems-An Optimistic Approach,” Proceedings of 17th IEEE Symposium on Reliable Distributed Systems, pp. 3-12, 1988.
G. Barigazzi and L. Strigni, “ Application-Transparent Setting of Recovery Points”, Digest of Papers Fault Tolerant Computing Systems-13, pp. 48-55, 1983.
Badrinath B. R, Acharya A., T. Imielinski “Structuring Distributed Algorithms for Mobile Hosts”, Proc. 14th Int. Conf. Distributed Computing Systems, June 1994.
Badrinath B. R, Acharya A., T. Imielinski “ Designing Distributed Algorithms for Mobile Computing Networks”, Computer Communications, Vol. 19, No. 4, 1996.
Cao G. and Singhal M., “On coordinated checkpointing in Distributed Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 9, no.12, pp. 1213-1225, Dec 1998.
Cao G. and Singhal M., “On the Impossibility of Min-process Non-blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems,” Proceedings of International Conference on Parallel Processing, pp. 37-44, August 1998.
Cao G. and Singhal M., “Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 12, no. 2, pp. 157-172, February 2001.
Cao G. and Singhal M., “Checkpointing with Mutable Checkpoints”, Theoretical Computer Science, 290(2003), pp. 1127-1148.
Chandy K. M. and Lamport L., “Distributed Snapshots: Determining Global State of Distributed Systems,” ACM Transaction on Computing Systems, vol. 3, No. 1, pp. 63-75, February 1985.
F. Cristian and F. Jahanian, “ A timestamp-based Checkpointing Protocol for Long Lived Distributed Computations”, Proc IEEE Symp. Reliable Distributed Systems, pp. 12-20, 1991.
David R. Jefferson, “Virtual Time”, ACM Transactions on Programming Languages and Systems, Vol. 7, NO.3, pp 404-425, July 1985.
Dang Y., Park, E.K. ,“ Checkpointing and Rollback-Recovery Algorithms in Distributed Systems”, Journal of Systems and Software, pp. 59-71, April 1994.
Dieter Kranzlmuller, Nam Thoai, Jens Volkert,“ Error Detection in Large Scale Parallel Programs with Long runtimes, Future Generation Computer Systems 19, pp. 689-700, 2003.
Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson D.B., “A Survey of Rollback-Recovery Protocols in Message-Passing Systems,” ACM Computing Surveys, vol. 34, no. 3, pp. 375-408, 2002.
Elnozahy E.N., Johnson D.B. and Zwaenepoel W., “The Performance of Consistent Checkpointing,” Proceedings of the 11th Symposium on Reliable Distributed Systems, pp. 39-47, October 1992.
Elnozahy and Zwaenepoel W, “ Manetho: Transparent Roll-back Recovery with Low-overhead, Limited Rollback and Fast Output Commit,” IEEE Trans. Computers, vol. 41, no. 5, pp. 526-531, May 1992.
Elnozahy and Zwaenepoel W, “ On the Use and Implementation of Message Logging,” 24th int’l Symp. Fault Tolerant Computing, pp. 298-307, IEEE Computer Society, June 1994.
George H. Forman and John Zahorjan, “The Challenges of Mobile Computing”, IEEE Computers vol. 27, no. 4, April 1994, pp. 38-47.
Richard C. Gass and Bidyut Gupta,“ An Efficient Checkpointing Scheme for Mobile Computing Systems”, European Simulation Symposium, Oct 18-20, 2001, pp. 1-6.
Hélary J. M., Mostefaoui A. and Raynal M., “Communication-Induced Determination of Consistent Snapshots,” Proceedings of the 28th International Symposium on Fault-Tolerant Computing, pp. 208-217, June 1998.
Higaki H. and Takizawa M., “Checkpoint-recovery Protocol for Reliable Mobile Systems,” Trans. of Information processing Japan, vol. 40, no.1, pp. 236-244, Jan. 1999.
Higaki H. and Takizawa M., “Recovery Protocol for Mobile Checkpointing”, IEEE 9th International Conference on Database Expert Systems Applications, Viena, pp. 520-525, 1998
Higaki H. and Takizawa M., “Checkpoint Recovery Protocol for Reliable Mobile Systems”, 17th Symposium on Reliable Distributed Systems, pp. 93-99, Oct. 1998.
Ioannidis, J., Duchamp, D., and Maguire, G.Q., “IP-based protocols for Mobile Internetworking”, In Proc. of ACM SIGCOMM Symposium on Communications, Architectures, and Protocols, pp. 235-245, September 1991.
Johnson, D.B., Zwaenepoel, W., “Sender-based message logging”, In Proceedingss of 17th international Symposium on Fault-Tolerant Computing, pp 14-19, 1987.
Johnson, D.B., Zwaenepoel, W., “Recovery in Distributed Systems using optimistic message logging and checkpointing. In 7th ACM Symposium on Principles of Distributed Computing, pp 171-181, 1988.
D. Johnson, “Distributed System Fault Tolerance Using Message Logging and Checkpointing,” Ph.D. Thesis, Rice Univ., Dec. 1989.
JinHo Ahn, Sung-Gi Min, Chong-Sun Hwang, “A Causal Message Logging Protocol for Mobile Nodes in Mobile Computing Environments”, Future Generation Computer Systems 20, pp 663-686, 2004.
Kalaiselvi, S., Rajaraman, V., “A Survey of Checkpointing Algorithms for Parallel and Distributed Systems”, Sadhna, Vol. 25, Part 5, October 2000, pp. 489-510.
Kistler, J., and Satyanaranyan, M., “ Disconnected Operation in the Coda file system”, ACM Trans. on Computer Systems 10, 1 (Feb. 1992).
Koo R. and Toueg S., “Checkpointing and Roll-Back Recovery for Distributed Systems,” IEEE Trans. on Software Engineering, vol. 13, no. 1, pp. 23-31, January 1987.
J.L. Kim, T. Park, “An efficient Protocol for checkpointing Recovery in Distributed Systems,” IEEE Trans. Parallel and Distributed Systems, pp. 955-960, Aug. 1993.
Kyne-Sup BYUN, Sung_Hwa LIM, Jai-Hoon KIM,“ Two-Tier Checkpointing Algorithm Using MSS in Wireless Networks”, IEICE Trans. Communications, Vol E86-B, No. 7, pp. 2136-2142, July 2003.
L. Kumar, M. Misra, R.C. Joshi, “Checkpointing in Distributed Computing Systems” Book Chapter “Concurrency in Dependable Computing”, pp. 273-92, 2002.
L. Kumar, M. Misra, R.C. Joshi, “Low overhead optimal checkpointing for mobile distributed systems” Proceedings. 19th IEEE International Conference on Data Engineering, pp 686 – 88, 2003.
Lalit Kumar, Parveen Kumar, R K Chauhan, “Logging based Coordinated Checkpointing in Mobile Distributed Computing Systems”, IETE Journal of Research, vol. 51, no. 6, pp. 485-490, 2005.
T.H. Lai and T.H. Yang,“ On Distributed Snapshots”, Information Processing Letters, vol. 25, pp. 153-158, 1987.
P.J. Leu and B.Bhargawa, “ Concurrent Robust Checkpointing and Recovery in Distributed Systems”, Proceeding Fourth Intl Conf. Data Engg. Pp. 154-163, Feb. 1988.
L. Lamport, “Time, clocks and ordering of events in a distributed system” Comm. ACM, vol.21, no.7, pp. 558-565, July 1978.
Lalit Kumar, Parveen Kumar, R K Chauhan, “Pitfalls in Minimum-process Coordinated Checkpointing protocols for Mobile Distributed”, ACCST Journal of Research, Volume III, No. 1, 2005 pp. 51-56.
Lalit Kumar, Parveen Kumar, R K Chauhan, “Message Logging and Checkpointing in Mobile Computing”, Journal of Multi-disciplinary Engineering Technologies, Vol.1, No.1, 2005, pp. 61-66.
Manivannan D. and Singhal M., “Quasi-Synchronous Checkpointing: Models, Characterization, and Classification,” IEEE Trans. Parallel and Distributed Systems, vol. 10, no. 7, pp. 703-713, July 1999.
Manivannan D., Netzer R. H. and Singhal M., “Finding Consistent Global Checkpoints in a Distributed Computation,” IEEE Transactions on Parallel & Distributed Systems, vol. 8, no. 6, pp. 623-627, June 1997.
Yoshifumi Manabe,“ A Distributed Consistent Global Checkpoint Algorithm for Distributed Mobile Systems”, 8th Int’l Conference on Parallel and Distributed Systems”, pp. 125-132, 2001.
Mannivannam, D., Singhal, M., “Failure Recovery based on Quasi-Synchronous Checkpointing in Mobile Computing Systems”, In TR No. OSU-CISRC-7/96-TR-36, Dept of Computer and Information Science, The Ohio State University, 1996.
Mannivannam, D., Singhal, M., “ A Low overhead Recovery Techniques using Quasi Synchronous Checkpointing”, Proc. 16th int’l conf. Distributed Computing Systems, pp 100-107, May 1996.
Yoshinori Morita, Kengo Hiraga and Hiroaki Higaki,“ Hybrid Checkpoint Protocol for Supporting Mobile-to-Mobile Communication”, Proc. Of the International Conference on Information Networking, 2001.
Ni, W., S. Vrbsky and S. Ray, “Pitfalls in Distributed Nonblocking Checkpointing”, Journal of Interconnection Networks, Vol. 1 No. 5, pp. 47-78, March 2004.
Netzer, R.H. and Xu,J ,“Necessary and Sufficient Conditions for Consistent Global Snapshots”, IEEE Trans. Parallel and Distributed Systems 6,2, pp 165-169, 1995.
Neves N. and Fuchs W. K., “Adaptive Recovery for Mobile Environments,” Communications of the ACM, vol. 40, no. 1, pp. 68-74, January 1997.
Parveen Kumar, Lalit Kumar, R K Chauhan, V K Gupta “A Non-Intrusive Minimum Process Synchronous Checkpointing Protocol for Mobile Distributed Systems” Proceedings of IEEE ICPWC-2005, January 2005.
Parveen Kumar, Lalit Kumar, R K Chauhan, “A low overhead Non-intrusive Hybrid Synchronous checkpointing protocol for mobile systems”, Journal of Multidisciplinary Engineering Technologies, Vol.1, No. 1, pp 40-50, 2005.
Parveen Kumar, Lalit Kumar, R K Chauhan, “Synchronous Checkpointing Protocols for Mobile Distributed Systems: A Comparative Study”, International Journal of information and computing science, Volume 8, No.2, 2005, pp 14-21.
Parveen Kumar, Lalit Kumar, R K Chauhan, “A Hybrid Coordinated Checkpointing Protocol for Mobile Computing Systems”, IETE journal of research, Vol 52, No. 2&3, pp 247-254, 2006.
Parveen Kumar, Lalit Kumar, R K Chauhan, “A Synchronous Checkpointing Protocol for Mobile Distributed Systems: A Probabilistic Approach, Accepted for Publication in International Journal of Information and Computer Security.
Pradhan D.K., Krishana P.P. and Vaidya N.H., “Recoverable Mobile Environment: Design and Trade-off Analysis,” Proceedings 26th International Symposium on Fault-Tolerant Computing, pp. 16-25, 1996.
Pradhan D.K. and Vaidya N., “Roll-forward Checkpointing Scheme: Concurrent Retry with Non-dedicated Spares,” Proceedings of the IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, pp. 166-174, July 1992.
Pushpendra Singh, Gilbert Cabillic, “A Checkpointing Algorithm for Mobile Computing Environment”, LNCS, No. 2775, pp 65-74, 2003.
Prakash R. and Singhal M., “Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems,” IEEE Transaction On Parallel and Distributed Systems, vol. 7, no. 10, pp. 1035-1048, October1996.
Prakash R. and Singhal M., “Maximum Global Snapshot with Concurrent Initiations”, Proc. Sixth IEEE Symp. Parallel and Distributed Processing, pp. 344-51, Oct. 1994.
M.L. Powell and D.L. Presotto, “Publishing: A Reliable Broadcase Communication Mechanism”, Proc. ninth Symp. Operating System Principles, pp. 100-109, ACM SIGOPS, Oct. 1983.
Purnendu Sinha, Da Qi Ren, “Formal Verification of Dependable Distributed Protocols”, Information and Software Technology, 45, pp. 873-888, 2003.
Quagila, F., Ciciani, R., Baldoni, R., “ Checkpointing Protocols in Distributed Systems with Mobile Hosts: A Performance Analysis”, IPPS/SPDP Workshop, pp. 742-755, 1998.
Randall, B, “ System Structure for Software Fault Tolerance”, IEEE Trans. on Software Engineering, 1,2, 220-232, 1975.
Russell, D.L., “State Restoration in Systems of Communicating Processes”, IEEE Trans. Software Engineering, 6,2. 183-194, 1980.
Ramanathan, P. and K.G. Shin, “Use of Common Time Base for Checkpointing and Rollback Recovery in a Distributed System”, IEEE Trans. Software Engg., pp. 571-583, June 1993.
R K Chauhan, Parveen Kumar, Lalit Kumar, “A coordinated checkpointing protocol for mobile computing systems”, International Journal of information and computing science, Accepted for Publication, Vol 9, No. 1, 2006.
R K Chauhan, Parveen Kumar, Lalit Kumar, “Hybrid and intrusive synchronous checkpointing protocols for mobile distributed systems”, Accepted for publication in ACCST Journal of Research, Volume IV, No. 4, 2006
R K Chauhan, Parveen Kumar, Lalit Kumar, “Non-intrusive Coordinated Checkpointing Protocols for Mobile Computing Systems : A Critical Survey, ACCST Journal of Research, to be published in Volume IV, No. 3, 2006.
R K Chauhan, Parveen Kumar, Lalit Kumar, “Checkpointing Distributed Applications on Mobile Computers”, Journal of Multidisciplinary Engineering and Technologies, Vol. 2 No.1, Jan. 2006.
Ssu K.F., Yao B., Fuchs W.K. and Neves N. F., “Adaptive Checkpointing with Storage Management for Mobile Environments,” IEEE Transactions on Reliability, vol. 48, no. 4, pp. 315-324, December 1999.
Silva, L.M. and J.G. Silva, “Global checkpointing for distributed programs”, Proc. 11th symp. Reliable Distributed Systems, pp. 155-62, Oct. 1992.
Storm R., and Temini, S., “Optimistic Recovery in Distributed Systems”, ACM Trans. Computer Systems, Aug, 1985, pp. 204-226.
A.P. Sistla and J.L. Welch,“ Efficient Distributed Recovery Using Message Logging”, Proc. 18th Symp. Principles of Distributed Computing”, pp 223-238, Aug. 1989.
Tamir, Y., Sequin, C.H., “Error Recovery in multi-computers using global checkpoints”, In Proceedings of the International Conference on Parallel Processing, pp. 32-41, 1984.
Terakota, F., Yokote, Y., and Tokoro, M., “A Network Architecture providing host migration transparency”, Proc, of ACM SIGCOMM 91, September 1991.
S. Venketasan and T.Y. Juang, “Efficient Algorithms for Optimistic Crash recovery”, Distributed Computing, vol. 8, no. 2, pp. 105-114, June 1994.
S. Venketasan, “Message-Optimal Incremental Snapshots”, Computer and Software Engineering, vol.1, no.3, pp. 211-231, 1993.
S. Venketasan, “ Optimistic Crash recovery Without Rolling back Non-Faulty Processors”, Information Sciences, 1993.
S. Venketasan and T.T.Y. Juang, “Low Overhead optimistic crash Recovery”, Proc. 11th Int. Conf. Distributed Computing systems, pp. 454-461, 1991.
Wada H., Yozawa, T., Ohnishi, T. and Tanaka, Y., “Mobile Computing Environment based on internet packet forwarding”, Winter Usenix, Jan. 1993.
Wang Y. M., Huang Y., Vo K.P., Chung P.Y. and Kintala C., “Checkpointing and its Applications,” Proceedings of the 25th International Symposium on Fault-Tolerant Computing (FTCS-25),pp. 22-31, June 1995.
Wood, W.G., “A Decentralized Recovery Control Protocol”, 1981 IEEE Symposium on Fault Tolerant Computing, 1981.
Wang Y. and Fuchs, W.K., “Lazy Checkpoint Coordination for Bounding Rollback Propagation,” Proc. 12th Symp. Reliable Distributed Systems, pp. 78-85, Oct. 1993.
Bin Yao, Kuo-Feng Ssu & W. Kect Fuchs, “Message Logging in Mobile Computing”, Proceedings of international conference on FTCS, pp 294-301, 1999.
Yasuro Sato, Michiko Inoue, Toshimitsu Masuzawa, Hideo Fujiwara, “ A Snapshot Algorithm for Distributed Mobile Systems” Proceedings of the 16th ICDCS, pp734-743,1996.
UNITED STATES




