TY - GEN
T1 - Substituting disk failure avoidance for redundancy in wide area fault tolerant storage systems
AU - Brumgard, Christopher
AU - Beck, Micah
PY - 2012
Y1 - 2012
N2 - The primary mechanism for overcoming faults in modern storage systems is to introduce redundancy in the form of replication and/or error correcting codes. The costs of such redundancy in hardware, system availability and overall complexity can be substantial, depending on the number and pattern of faults that are handled. In this paper, we describe a system that seeks to use disk failure avoidance to reduce the need for costly redundancy by using adaptive heuristics that predict such failures. While a number of predictive factors such as hard drive utilization rate, age, SMART errors, and model can be used, the initial work we present here focuses on SMART errors. Our approach can predict where near term disk failures are more likely to occur, enabling proactive movement/replication of at-risk data, thus maintaining data integrity and availability. Our strategy can reduce costs due to redundant storage without compromising these important requirements.
AB - The primary mechanism for overcoming faults in modern storage systems is to introduce redundancy in the form of replication and/or error correcting codes. The costs of such redundancy in hardware, system availability and overall complexity can be substantial, depending on the number and pattern of faults that are handled. In this paper, we describe a system that seeks to use disk failure avoidance to reduce the need for costly redundancy by using adaptive heuristics that predict such failures. While a number of predictive factors such as hard drive utilization rate, age, SMART errors, and model can be used, the initial work we present here focuses on SMART errors. Our approach can predict where near term disk failures are more likely to occur, enabling proactive movement/replication of at-risk data, thus maintaining data integrity and availability. Our strategy can reduce costs due to redundant storage without compromising these important requirements.
KW - Hard drives
KW - Logistical networking
KW - SMART errors
UR - http://www.scopus.com/inward/record.url?scp=84872590084&partnerID=8YFLogxK
U2 - 10.1109/ClusterW.2012.39
DO - 10.1109/ClusterW.2012.39
M3 - Conference contribution
AN - SCOPUS:84872590084
SN - 9780768548449
T3 - Proceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
SP - 25
EP - 31
BT - Proceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
PB - IEEE Computer Society
T2 - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
Y2 - 24 September 2012 through 28 September 2012
ER -