A Survey and Evaluation of Fault Tolerant Techniques of Virtual Machine in Cloud Computing Environment

(*) Corresponding author

Authors' affiliations

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)


With the rapid development of cloud computing, virtualization technology has experienced considerable progress and application. Virtualization technology allows not only an application, but the operating system and the hardware as well, to be encapsulated into a virtual machine (VM). Although virtualization technology has been widely adopted in cloud computing, there are still many research issues to be fully addressed. Frequent occurrence of VM outage incidents indicates that fault tolerant techniques of VM are needed to achieve high availability of cloud computing infrastructure. This paper introduces techniques related to fault tolerance of VM, surveys existing fault tolerant techniques based on live VM migration. Then it evaluates some of these techniques under kernel-build workload in terms of three performance evaluation metrics: downtime, total migration time (TMT), total data transmitted (TDT). Finally, this paper looks into future researches in this field. This comparative study will help VM researchers to better understand the behavior and performance of different fault tolerant techniques of VM in detail.
Copyright © 2013 Praise Worthy Prize - All rights reserved.


Virtualization Technology; Virtual Machine; Fault Tolerance; Live Migration; Cloud Computing

Full Text:



C. Books, Heroku learns from Amazon EC2 outage, http://searchcloudcomputing.techtarget.com/news/1378426/Heroku-learns-from-Amazon-EC2-outage, Jan. 8, 2010.

J. Saarinen, Amazon explains recent outages, http://www.crn.com.au/News/307746,amazon-explains-recent-outages.aspx, July 6, 2012.

C. Strachey, Time Sharing in Large Fast Computers, In Proceedings of the International Conference on Information Processing, pp.336-341, June 1959.

R. P. Goldberg, Survey of Virtual Machine Research, IEEE Computer, Vol. 7, No. 6, pp. 34-45, June 1974.

M. Nelson, B. H. Lim and G. Hutchins, Fast Transparent Migration for Virtual Machines, In Proceedings of USENIX Annual Technical Conference (USENIX’05), pp. 391-394, April 2005.

C. Clark, K. Fraser, S. Hand, J. G. Hansen, et al., Live Migration of Virtual Machines, In Proceedings of 2nd Symposium on Networked Systems Design and Implementation (NSDI’05), pp. 273-286, May, 2005.

W. Voorsluys, J. Broberg, S. Venugopal and R. Buyya, Cost of Virtual Machine Live Migration in Clouds: A Performance Evaluation, In Proceedings of 1st International Conference on Cloud Computing, pp. 254-265, Dec. 2009.

A. Fischer, A. Fessi, G. Carle and H. Meer, Wide-Area Virtual Machine Migration as Resilience Mechanism, In Proceedings of IEEE Symposium on Reliable Distributed Systems Workshops (SRDSW), pp. 72-77, Oct. 2011.

S. Loveland, E. M. Dow and F. LeFevre, Leveraging virtualization to optimize high-availability system configurations, IBM Systems Journal, Vol. 47, No. 4, pp. 591-604, Oct.-Dec. 2008.

A. Agbaria and R. Friedman, Virtual-machine-based heterogeneous checkpointing, Software - Practice & Experience, Vol. 32, No. 12, 1175-1192, Oct. 2002.

M. H. Sun and D. M. Blough, Fast, Lightweight Virtual Machine Checkpointing, Technical Report, Georgia Institute of Technology, 2010.

T. C. Bressoud and F. B. Schneider, Hypervisor-based fault tolerance, ACM Transactions on Computer Systems, Vol. 14, No. 1, pp. 80-107, Feb. 1996.

J. Zhu, Z. F. Jiang, Z. Xiao and X. M. Li, Optimizing the Performance of Virtual Machine Synchronization for Fault Tolerance, IEEE Transactions on Computers, Vol. 60, No. 12, pp. 1718-1729, Dec. 2011.

A. Polze, P. Tröger and F. Salfner, Timely Virtual Machine Migration for Proactive Fault Tolerance, In Proceedings of 14th IEEE International Symposium on Object / Component / Service-Oriented Real-Time Distributed Computing Workshops (ISORCW), pp. 234-243, March 2011.

A. B. Nagarajan, F. Mueller, C. Engelmann and S. L. Scott, Proactive Fault Tolerance for HPC with Xen Virtualization, In Proceedings of 21st ACM International Conference on Supercomputing (ICS’07), pp.23-32, June 2007.

S. S. Wang, K. Q. Yan and S. C. Wang, Achieving efficient agreement within a dual-failure cloud-computing environment, Expert Systems with Applications, Vol. 38, No. 1, pp. 906-915, Jan. 2011.

J. Deng, S. C. H. Huang, Y. S. Han and J. H. Deng, Fault-Tolerant and Reliable Computation in Cloud Computing, In Proceedings of IEEE Globecom Workshop on Web and Pervasive Security, pp. 1601-1605, Dec. 2010.

X. Z. Kong, J. W. Huang, C. Lin and P. D. Ungsunan, Performance, Fault-tolerance and Scalability Analysis of Virtual Infrastructure Management System, In Proceedings of IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 282-289, Aug. 2009.

A. Bala and I. Chana, Fault Tolerance - Challenges, Techniques and Implementation in Cloud Computing, International Journal of Computer Science Issues, Vol. 9, No 1, pp. 288-293, Jan. 2012.

Xiao, P., Liu, D., Qu, P., An integrated performance evaluation middleware for virtual resources in Cloud environments, (2012) International Review on Computers and Software (IRECOS), 7 (2), pp. 579-585.

B. Cully, G. Lefebvre, D. Meyer, M. Feeley, et al., Remus: High Availability via Asynchronous Virtual Machine Replication, In Proceedings of USENIX Symposium on Networked Systems Design and Implementation (Best Paper), pp. 161-174, April 2008.

S. Rajagopalan, B. Cully, R. O’Connor and A. Warfield, SecondSite: disaster tolerance as a service, In Proceedings of ACM/USENIX Conference on Virtual Execution Environment, pp. 97-108, March 2012.

Y. Tamura, K. Sato, S. Kihara and S. Moriai, Kemari: Virtual Machine Synchronization for Fault Tolerance, In Proceedings of USENIX Annual Technical Conference (Poster Session), pp. 1-2, June 2008.

H. K. Liu, H. Jin, X. F. Liao, L. T. Hu, and C. Yu, Live Migration of Virtual Machine based on full system trace and replay, In Proceedings of International Symposium on High Performance Distributed Computing (HPDC), pp. 101-110, June 2009.

G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai and P. M. Chen, ReVirt: Enabling Intrusion Analysis through VirtualMachine Logging and Replay, In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI’02), pp.211-224, Dec. 2002.

H. K. Liu, H. Jin, X. F. Liao and Y. Chen, Live Virtual Machine Migration via Asynchronous Replication and State Synchronization, IEEE Transactions on Parallel and Distributed Systems, Vol. 22, No. 12, pp. 1986-1999, Dec. 2011.

H. K. Liu, H. Jin, X. F. Liao, B. Ma and C. Z. Xu, VMckpt: lightweight and live virtual machine checkpointing, Science China, Information Sciences, Vol. 55, No. 1, pp. 2865-2880, Jan. 2012.

K. Y. Hou, M. Uysal, A. Merchant, K. G. Shin and S. Singhal, HydraVM: Low-cost, Transparent High Availability for virtual machines, Technical Report, HPL-2011-24, HP Laboraties, 2011.

T. Wood, H.A. Lagar-Cavilla, K. K. Ramakrishnan, et al., PipeCloud: Using Causality to Overcome Speed-of-Light Delays in Cloud-Based Disaster Recovery, In Proceedings of ACM Symposium on Cloud Computing (SoCC), Article No.: 17, Oct. 2011.

H. L. Dong, W. Sun, B. Wang, H. Y. Sun, et al., Memvisor: Application Level Memory Mirroring via Binary Translation, In Proceedings of IEEE International Conference on Cluster Computing (CLUSTER), pp. 562-565, 2012.

Peng Lu, Resilire: Achieving High Availability at the Virtual Machine Level, Ph.D. Thesis, Virginia Polytechnic Institute and State University, 2012.

C. T. Yang, W. L. Chou, C. H. Hsu and A. Cuzzocrea, On Improvement of Cloud Virtual Machine Availability with Virtualization Fault Tolerance Mechanism, In Proceedings of 3rd IEEE International Conference on Cloud Computing Technology and Science, pp. 122-129, Nov. 2011.


  • There are currently no refbacks.

Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2023 Praise Worthy Prize