Symmetric active/active metadata service for high availability parallel file systems

Xubin He, Li Ou, Christian Engelmann, Xin Chen, Stephen L. Scott

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

High availability data storage systems are critical for many applications as research and business become more data driven. Since metadata management is essential to system availability, multiple metadata services are used to improve the availability of distributed storage systems. Past research has focused on the active/standby model, where each active service has at least one redundant idle backup. However, interruption of service and even some loss of service state may occur during a fail-over depending on the replication technique used. In addition, the replication overhead for multiple metadata services can be very high. The research in this paper targets the symmetric active/active replication model, which uses multiple redundant service nodes running in virtual synchrony. In this model, service node failures do not cause a fail-over to a backup and there is no disruption of service or loss of service state. A fast delivery protocol is further discussed to reduce the latency of the total order broadcast needed. The prototype implementation shows that metadata service high availability can be achieved with an acceptable performance trade-off using the symmetric active/active metadata service solution.

Original languageEnglish
Pages (from-to)961-973
Number of pages13
JournalJournal of Parallel and Distributed Computing
Volume69
Issue number12
DOIs
StatePublished - Dec 2009

Funding

This work was sponsored in part by the Office of Advanced Scientific Computing Research, U.S. Department of Energy. The work at Tennessee Tech University was sponsored by the Laboratory Directed Research and Development Program of ORNL, by the U.S. National Science Foundation under Grant Nos. CNS-0617528 and CNS-0720617, and by the Office of Research of Tennessee Technological University. It was performed in part at Oak Ridge National Laboratory (ORNL), which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725. The authors would like to thank the anonymous reviewers and the Elsevier editor for their valuable feedback to improve the quality of this article.

Keywords

  • Fault tolerance
  • Group communication
  • High availability
  • Metadata management
  • Parallel file systems

Fingerprint

Dive into the research topics of 'Symmetric active/active metadata service for high availability parallel file systems'. Together they form a unique fingerprint.

Cite this