Fault-tolerance in the network storage stack

  • S. Atchley
  • , S. Soltesz
  • , J. S. Plank
  • , M. Beck
  • , T. Moore

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper addresses the issue of fault-tolerance in applications that make use of network storage. A network storage abstraction called the Network Storage Stack is presented, along with its constituent parts. In particular, a data type called the exNode is detailed, along with tools that allow it to be used to implement a wide-area, striped and replicated file. Using these tools, we evaluate the fault-tolerance of several exNode "files," composed of variable-size blocks stored on 14 different machines at five locations throughout the United States. The results demonstrate that while failures in using network storage occur frequently, the tools built on the Network Storage Stack tolerate them gracefully, and with good performance.

Original languageEnglish
Title of host publicationProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages136
Number of pages1
ISBN (Electronic)0769515738, 9780769515731
DOIs
StatePublished - 2002
Externally publishedYes
Event16th International Parallel and Distributed Processing Symposium, IPDPS 2002 - Ft. Lauderdale, United States
Duration: Apr 15 2002Apr 19 2002

Publication series

NameProceedings - International Parallel and Distributed Processing Symposium, IPDPS 2002

Conference

Conference16th International Parallel and Distributed Processing Symposium, IPDPS 2002
Country/TerritoryUnited States
CityFt. Lauderdale
Period04/15/0204/19/02

Funding

This material is based upon work supported by the National Science Foundation under grants ACI-9876895, EIA-9975015, EIA-9972889, ANI-9980203, the Department of Energy under the Sci-DAC/ASCR program, and the University of Tennessee Center for Information Technology Research. The authors are grateful to Norman Ramsey for granting access to the Harvard machines, Rich Wolski for granting access to the Santa Barbara machines, and Henri Casanova for granting access to the San Diego machines. Additionally, the authors acknowledge Jim Ding for help in authoring the Logistical Tools, and Alex Bassi and Yong Zheng for authoring the exNode library. Finally, the authors acknowledge Rich Wolski and Jack Dongarra for their vital participation in the Logistical Internetworking and Computing project.

Fingerprint

Dive into the research topics of 'Fault-tolerance in the network storage stack'. Together they form a unique fingerprint.

Cite this