Experimental assessment of workstation failures and their impact on checkpointing systems

James S. Plank, Wael R. Elwasif

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

93 Scopus citations

Abstract

In the past twenty years, there has been a wealth of theoretical research on minimizing the expected running time of a program in the presence of failures by employing checkpointing and rollback recovery. In the same time period, there has been little experimental research to corroborate these results. In this paper, we study three separate projects that monitor failure in workstation networks. Our goals are twofold. The first is to see how these results correlate with the theoretical results, and the second is to assess their impact on strategies for checkpointing long-running computations on workstations and networks of workstations. A significant result of our work is that although the base assumptions of the theoretical research do not hold, many of the results are still applicable.

Original languageEnglish
Title of host publicationDigest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages48-57
Number of pages10
ISBN (Electronic)0818684704, 9780818684708
DOIs
StatePublished - 1998
Externally publishedYes
Event28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998 - Munich, Germany
Duration: Jun 23 1998Jun 25 1998

Publication series

NameDigest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Volume1998-January

Conference

Conference28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Country/TerritoryGermany
CityMunich
Period06/23/9806/25/98

Fingerprint

Dive into the research topics of 'Experimental assessment of workstation failures and their impact on checkpointing systems'. Together they form a unique fingerprint.

Cite this