CFP last date
20 May 2024
Reseach Article

FTM-A Middle Layer Architecture for Fault Tolerance in Cloud Computing

Published on November 2012 by L. Arockiam, Geo Francis E
Issues and Challenges in Networking, Intelligence and Computing Technologies
Foundation of Computer Science USA
ICNICT - Number 2
November 2012
Authors: L. Arockiam, Geo Francis E
4b53a318-a1b7-4296-8010-e0b0166a16cc

L. Arockiam, Geo Francis E . FTM-A Middle Layer Architecture for Fault Tolerance in Cloud Computing. Issues and Challenges in Networking, Intelligence and Computing Technologies. ICNICT, 2 (November 2012), 12-16.

@article{
author = { L. Arockiam, Geo Francis E },
title = { FTM-A Middle Layer Architecture for Fault Tolerance in Cloud Computing },
journal = { Issues and Challenges in Networking, Intelligence and Computing Technologies },
issue_date = { November 2012 },
volume = { ICNICT },
number = { 2 },
month = { November },
year = { 2012 },
issn = 0975-8887,
pages = { 12-16 },
numpages = 5,
url = { /specialissues/icnict/number2/9022-1029/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Special Issue Article
%1 Issues and Challenges in Networking, Intelligence and Computing Technologies
%A L. Arockiam
%A Geo Francis E
%T FTM-A Middle Layer Architecture for Fault Tolerance in Cloud Computing
%J Issues and Challenges in Networking, Intelligence and Computing Technologies
%@ 0975-8887
%V ICNICT
%N 2
%P 12-16
%D 2012
%I International Journal of Computer Applications
Abstract

Due to cloud computing, many of the traditional issues such as scale have been eliminated to some extent, but the stability, availability and reliability of cloud computing has received relatively limited attention. As cloud computing envisages "computing as a service" it presumes 99. 99% reliability as Electricity Grid has achieved. Reliability of a cloud computing system depends on the probability of the failure occurring in different layers of the architecture. Virtualization technique is common in cloud computing, i. e. , many virtual machines even with different operating systems may be running in a single physical machine. In order to achieve optimum fault tolerance to these virtual machines, in this paper, a middle layer is proposed and it can be placed between application layer and virtualization layer in cloud system architecture. Purpose of this middle layer is to tolerate node failure. This layer can be seen as an assemblage of various components, each with a specific functionality and it makes use of combinations of various fault tolerant strategies to achieve optimum result. Performance of this middle layer is automatic and it is user transparent too, i. e. , considering economic factors, dependability factors and user's interest, it makes use of different permutations.

References
  1. Avi?zienis. "The N-Version Approach to Fault-Tolerant Software. " IEEE Transactions on Software Engineering, SE-11(12) (December 1985) :1491–1501.
  2. http://opennebula. org/documentation:archives:rel2. 2:ftguide
  3. http://www. davidchappell. com/writing/white_papers/introducing _windows_azure_v1-chappell. pdf
  4. http://aws. amazon. com/ec2/
  5. Webbing Zhao et. al. "Fault Tolerance Middleware for cloud computing. " Third International Conference on Cloud Computing (2010): 67-74.
  6. Tchana Alain et. al. "Fault Tolerant Approaches in Cloud Computing Infrastructures. " The Eight International Conference on Autonomic and Autonomous Systems (2012): 42-48.
  7. Slawinska, Magdalena, Jaroslaw Slawinski, and Vaidy Sunderam. "Unibus: Aspects of heterogeneity and fault tolerance in cloud computing. " 2010 IEEE International Symposium on Parallel Distributed Processing Workshops and Phd Forum IPDPSW 2 (2010): 1-10.
  8. H. Chen, G. Jiang, and K. Yoshihira. "Failure detection in large-scale internet services by principal subspace mapping. " IEEE Trans. on Knowledge and Data Engineering, (2007).
  9. M. Chen, A. X. Zheng, J. Lloyd, M. I. Jordan, and E. Brewer. Failure diagnosis using decision trees. Autonomic Computing, International Conference on Autonomic Computing (ICAC), (2004).
  10. P. Bodik, M. Goldszmidt, A. Fox, D. B. Woodard, and H. Andersen. Fingerprinting the datacenter: Automated classifcation of performance crises. Proc. of the 5th European Conference on Computer Systems, (2010):111-124.
  11. I. Cohen, M. Goldszmidt, T. Kelly, and J. Symons. Correlating instrumentation data to system states: A building block for automated diagnosis and control. in 6th Symposium on Operating Systems Design and Implementation (OSDI), San Francisco, CA, (2004).
  12. Malik, Sheheryar, and Fabrice Huet. "Adaptive Fault Tolerance in Real Time Cloud Computing. " 2011 IEEE World Congress on Services (2011): 280-287.
  13. Zhao, Laiping et al. "Fault-Tolerant Scheduling with Dynamic Number of Replicas in Heterogeneous Systems. " 2010 IEEE 12th International Conference on High Performance Computing and Communications HPCC (2010): 434-441.
  14. http://aws. amazon. com/ec2/purchasing-options/
  15. Roman. A Survey of Checkpoint/Restart Implementations. Technical Report LBNL-54942, Lawrence Berkeley National Laboratory, (2002).
  16. Bogdan Nicolae, Franck Cappello, "BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots," Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (2011): 1-12.
  17. Zheng, Zibin et al. "FTCloud: A Component Ranking Framework for Fault-Tolerant Cloud Applications. " 2010 IEEE 21st International Symposium on Software Reliability Engineering (2010): 398-407.
Index Terms

Computer Science
Information Sciences

Keywords

FTM-A Middle