Abstract
High-performance computing (HPC) facilities have employed flash-based storage tier near to compute nodes to absorb high I/O demand by HPC applications during periodic system-level checkpoints. To accelerate these checkpoints, proxy-based distributed key-value stores (PD-KVS) gained particular attention for their flexibility to support multiple backends and different network configurations. PD-KVS rely internally on monolithic KVS, such as LevelDB or RocksDB, to exploit the KV interface and query support. However, PD-KVS are unaware of the high redundancy factor in checkpoint data, which can be up to GBs to TBs, and therefore, tend to generate high write and space amplification on these storage layers. In this paper, we propose DenKv which is deduplication-extended node-local LSM-tree-based KVS. DenKv employs asynchronous partially inline dedup (APID) and aims to maintain the performance characteristics of LSM-tree-based KVS while reducing the write and space amplification problems. We implemented DenKv atop BlobDB and showed that our proposed solution maintains performance while reducing write amplification up to 2× and space amplification by 4× on average.
Original language | English |
---|---|
Title of host publication | Proceedings of PDSW 2022 |
Subtitle of host publication | 7th International Parallel Data Systems Workshop, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 20-25 |
Number of pages | 6 |
ISBN (Electronic) | 9781665475624 |
DOIs | |
State | Published - 2022 |
Event | 7th IEEE/ACM International Parallel Data Systems Workshop, PDSW 2022 - Dallas, United States Duration: Nov 13 2022 → Nov 18 2022 |
Publication series
Name | Proceedings of PDSW 2022: 7th International Parallel Data Systems Workshop, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 7th IEEE/ACM International Parallel Data Systems Workshop, PDSW 2022 |
---|---|
Country/Territory | United States |
City | Dallas |
Period | 11/13/22 → 11/18/22 |
Funding
This work was supported in part by the Korea Institute of Science and TechnologyInformation (Grant No.J-22-NB-C03-S01) and by the National Research Foundation of Korea(NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1A2C2014386). This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledgesthat the U.S. Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan)Y.. Kim is the corresponding author.
Keywords
- Deduplication
- High Performance Computing
- Key-Value Stores
- Log-Structures Merge Tree