TY - GEN
T1 - A robust fault-tolerant and scalable cluster-wide deduplication for shared-nothing storage systems
AU - Khan, Awais
AU - Lee, Chang Gyu
AU - Hamandawana, Prince
AU - Park, Sungyong
AU - Kim, Youngjae
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/7
Y1 - 2018/11/7
N2 - Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-Tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared-nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.
AB - Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-Tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared-nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.
KW - Data Deduplication
KW - Distributed Storage Systems
KW - Distributed and cloud computing
KW - Storage and file systems
UR - http://www.scopus.com/inward/record.url?scp=85058318460&partnerID=8YFLogxK
U2 - 10.1109/MASCOTS.2018.00016
DO - 10.1109/MASCOTS.2018.00016
M3 - Conference contribution
AN - SCOPUS:85058318460
T3 - Proceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
SP - 87
EP - 93
BT - Proceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
Y2 - 25 September 2018 through 28 September 2018
ER -