Abstract
Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a system failure, resulting in inconsistencies in data and deduplication metadata. In this paper, we propose a robust, fault-Tolerant and scalable cluster-wide deduplication that can eliminate duplicate copies across the cluster. We design a distributed deduplication metadata shard which guarantees performance scalability while preserving the design constraints of shared-nothing storage systems. The placement of chunks and deduplication metadata is made cluster-wide based on the content fingerprint of chunks. To ensure transactional consistency and garbage identification, we employ a flag-based asynchronous consistency mechanism. We implement the proposed deduplication on Ceph. The evaluation shows high disk-space savings with minimal performance degradation as well as high robustness in the event of sudden server failure.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 87-93 |
| Number of pages | 7 |
| ISBN (Electronic) | 9781538668863 |
| DOIs | |
| State | Published - Nov 7 2018 |
| Externally published | Yes |
| Event | 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018 - Milwaukee, United States Duration: Sep 25 2018 → Sep 28 2018 |
Publication series
| Name | Proceedings - 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018 |
|---|
Conference
| Conference | 26th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2018 |
|---|---|
| Country/Territory | United States |
| City | Milwaukee |
| Period | 09/25/18 → 09/28/18 |
Funding
This work was supported by Institute for Information & communications TechnologyPromotion(IITP) grant funded by the Korea government(MSIT) (No.2014-0-00035).
Keywords
- Data Deduplication
- Distributed Storage Systems
- Distributed and cloud computing
- Storage and file systems