A framework for proactive fault tolerance

Geoffroy Vallée, Kulathep Charoenpornwattana, Christian Engelmann, Anand Tikotekar, Chokchai Leangsuksun, Thomas Naughton, Stephen L. Scott

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

49 Scopus citations

Abstract

Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/restart or duplication. However it is also possible to anticipate failures and pro actively take action before failures occur in order to minimize failure impact on the system and application execution. This document presents a proactive fault tolerance framework. This framework can use different proactive fault tolerance mechanisms, i.e., migration and pause/unpause. The framework also allows the implementation of new proactive fault tolerance policies thanks to a modular architecture. A first proactive fault tolerance policy has been implemented and preliminary experimentations have been done based on system-level virtualization and compared with results obtained by simulation.

Original languageEnglish
Title of host publicationARES 2008 - 3rd International Conference on Availability, Security, and Reliability, Proceedings
Pages659-664
Number of pages6
DOIs
StatePublished - 2008
Event3rd International Conference on Availability, Security, and Reliability, ARES 2008 - Barcelona, Spain
Duration: Mar 4 2008Mar 7 2008

Publication series

NameARES 2008 - 3rd International Conference on Availability, Security, and Reliability, Proceedings

Conference

Conference3rd International Conference on Availability, Security, and Reliability, ARES 2008
Country/TerritorySpain
CityBarcelona
Period03/4/0803/7/08

Fingerprint

Dive into the research topics of 'A framework for proactive fault tolerance'. Together they form a unique fingerprint.

Cite this