TY - GEN
T1 - Experimental assessment of workstation failures and their impact on checkpointing systems
AU - Plank, James S.
AU - Elwasif, Wael R.
N1 - Publisher Copyright:
© 1998 IEEE.
PY - 1998
Y1 - 1998
N2 - In the past twenty years, there has been a wealth of theoretical research on minimizing the expected running time of a program in the presence of failures by employing checkpointing and rollback recovery. In the same time period, there has been little experimental research to corroborate these results. In this paper, we study three separate projects that monitor failure in workstation networks. Our goals are twofold. The first is to see how these results correlate with the theoretical results, and the second is to assess their impact on strategies for checkpointing long-running computations on workstations and networks of workstations. A significant result of our work is that although the base assumptions of the theoretical research do not hold, many of the results are still applicable.
AB - In the past twenty years, there has been a wealth of theoretical research on minimizing the expected running time of a program in the presence of failures by employing checkpointing and rollback recovery. In the same time period, there has been little experimental research to corroborate these results. In this paper, we study three separate projects that monitor failure in workstation networks. Our goals are twofold. The first is to see how these results correlate with the theoretical results, and the second is to assess their impact on strategies for checkpointing long-running computations on workstations and networks of workstations. A significant result of our work is that although the base assumptions of the theoretical research do not hold, many of the results are still applicable.
UR - http://www.scopus.com/inward/record.url?scp=85014175705&partnerID=8YFLogxK
U2 - 10.1109/FTCS.1998.689454
DO - 10.1109/FTCS.1998.689454
M3 - Conference contribution
AN - SCOPUS:85014175705
T3 - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
SP - 48
EP - 57
BT - Digest of Papers - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th Annual International Symposium on Fault-Tolerant Computing, FTCS 1998
Y2 - 23 June 1998 through 25 June 1998
ER -