TY - JOUR
T1 - Symmetric active/active metadata service for high availability parallel file systems
AU - He, Xubin
AU - Ou, Li
AU - Engelmann, Christian
AU - Chen, Xin
AU - Scott, Stephen L.
PY - 2009/12
Y1 - 2009/12
N2 - High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending on the replication technique used. In addition, the replication overhead for multiple metadata services can be very high. The research in this paper targets the symmetric active/active replication model, which uses multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability can be achieved with an acceptable performance trade-off using the symmetric active/active metadata service solution.
AB - High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending on the replication technique used. In addition, the replication overhead for multiple metadata services can be very high. The research in this paper targets the symmetric active/active replication model, which uses multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability can be achieved with an acceptable performance trade-off using the symmetric active/active metadata service solution.
KW - Fault tolerance
KW - Group communication
KW - High availability
KW - Metadata management
KW - Parallel file systems
UR - http://www.scopus.com/inward/record.url?scp=70350567175&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2009.08.004
DO - 10.1016/j.jpdc.2009.08.004
M3 - Article
AN - SCOPUS:70350567175
SN - 0743-7315
VL - 69
SP - 961
EP - 973
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
IS - 12
ER -