Abstract
Log-Structured Merge Tree (LSM-tree) based key-value stores excel in write-intensive environments but suffer from data duplication, consuming up to 49% of storage space in LSM-tree-based key-value store deployments. Traditional solutions like compression and coarse-grained file system-level deduplication introduce overhead or have limited effectiveness. In this study, we propose DedupKV, a fine-grained deduplication framework tailored for LSM-tree, maximizing data reduction efficiency while minimizing write stalls and read overheads. DedupKV features three key innovations: (1) FLUSH-integrated inline deduplication, which removes duplicates during memory-to-storage writes; (2) WAL file-based offline deduplication, repurposing write-ahead logs to avoid double writes; and (3) elastic execution, dynamically balancing inline and offline deduplication based on memory pressure and workload intensity. Additionally, dynamic granularity management reduces deduplication metadata overhead. We implemented these four ideas in RocksDB for the first time and conducted experiments in a Linux environment. Our evaluation shows that WAL file-based offline deduplication and DedupKV outperform BlobDB by 33% and 23%, respectively, in write-heavy workloads, while reducing write amplification by 1.2 ×, 2 ×, and 1.6 × for real KV datasets.
| Original language | English |
|---|---|
| Title of host publication | ACM ICS 2025 - Proceedings of the 39th ACM International Conference on Supercomputing |
| Publisher | Association for Computing Machinery |
| Pages | 580-595 |
| Number of pages | 16 |
| ISBN (Electronic) | 9798400715372 |
| DOIs | |
| State | Published - Aug 22 2025 |
| Event | 39th ACM International Conference on Supercomputing, ICS 2025 - Lake City, United States Duration: Jun 8 2025 → Jun 11 2025 |
Publication series
| Name | Proceedings of the International Conference on Supercomputing |
|---|---|
| Volume | Part of 213821 |
Conference
| Conference | 39th ACM International Conference on Supercomputing, ICS 2025 |
|---|---|
| Country/Territory | United States |
| City | Lake City |
| Period | 06/8/25 → 06/11/25 |
Funding
This work was partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2025-00564249 and RS-2024-00416666). This research also used resources of the Oak Ridge Leadership Computing Facility, located at the National Center for Computational Sciences at the Oak Ridge National Laboratory, which is supported by the Office of Science of the DOE under Contract DE-AC05-00OR22725. The work performed by HE at Temple University is partially supported by the US National Science Foundation under grant #OAC-2311758.
Keywords
- Deduplication
- Key-Value Store
- Log-Structured Merge Tree