A robust fault-tolerant and scalable cluster-wide deduplication for shared-nothing storage systems

Awais Khan, Chang Gyu Lee, Prince Hamandawana, Sungyong Park, Youngjae Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-Tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared-nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.

Original languageEnglish
Title of host publicationProceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages87-93
Number of pages7
ISBN (Electronic)9781538668863
DOIs
StatePublished - Nov 7 2018
Externally publishedYes
Event26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018 - Milwaukee, United States
Duration: Sep 25 2018Sep 28 2018

Publication series

NameProceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018

Conference

Conference26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018
Country/TerritoryUnited States
CityMilwaukee
Period09/25/1809/28/18

Funding

This work was supported by Institute for Information & communications TechnologyPromotion(IITP) grant funded by the Korea government(MSIT) (No.2014-0-00035).

FundersFunder number
Institute for Information & communications TechnologyPromotion
Institute for Information and Communications Technology Promotion
Ministry of Science and ICT, South Korea

    Keywords

    • Data Deduplication
    • Distributed Storage Systems
    • Distributed and cloud computing
    • Storage and file systems

    Fingerprint

    Dive into the research topics of 'A robust fault-tolerant and scalable cluster-wide deduplication for shared-nothing storage systems'. Together they form a unique fingerprint.

    Cite this