Substituting disk failure avoidance for redundancy in wide area fault tolerant storage systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The primary mechanism for overcoming faults in modern storage systems is to introduce redundancy in the form of replication and/or error correcting codes. The costs of such redundancy in hardware, system availability and overall complexity can be substantial, depending on the number and pattern of faults that are handled. In this paper, we describe a system that seeks to use disk failure avoidance to reduce the need for costly redundancy by using adaptive heuristics that predict such failures. While a number of predictive factors such as hard drive utilization rate, age, SMART errors, and model can be used, the initial work we present here focuses on SMART errors. Our approach can predict where near term disk failures are more likely to occur, enabling proactive movement/replication of at-risk data, thus maintaining data integrity and availability. Our strategy can reduce costs due to redundant storage without compromising these important requirements.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
PublisherIEEE Computer Society
Pages25-31
Number of pages7
ISBN (Print)9780768548449
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012 - Beijing, China
Duration: Sep 24 2012Sep 28 2012

Publication series

NameProceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012

Conference

Conference2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
Country/TerritoryChina
CityBeijing
Period09/24/1209/28/12

Keywords

  • Hard drives
  • Logistical networking
  • SMART errors

Fingerprint

Dive into the research topics of 'Substituting disk failure avoidance for redundancy in wide area fault tolerant storage systems'. Together they form a unique fingerprint.

Cite this