DEDUPKV: A Space-Efficient and High-Performance Key-Value Store via Fine-Grained Deduplication

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Log-Structured Merge Tree (LSM-tree) based key-value stores excel in write-intensive environments but suffer from data duplication, consuming up to 49% of storage space in LSM-tree-based key-value store deployments. Traditional solutions like compression and coarse-grained file system-level deduplication introduce overhead or have limited effectiveness. In this study, we propose DedupKV, a fine-grained deduplication framework tailored for LSM-tree, maximizing data reduction efficiency while minimizing write stalls and read overheads. DedupKV features three key innovations: (1) FLUSH-integrated inline deduplication, which removes duplicates during memory-to-storage writes; (2) WAL file-based offline deduplication, repurposing write-ahead logs to avoid double writes; and (3) elastic execution, dynamically balancing inline and offline deduplication based on memory pressure and workload intensity. Additionally, dynamic granularity management reduces deduplication metadata overhead. We implemented these four ideas in RocksDB for the first time and conducted experiments in a Linux environment. Our evaluation shows that WAL file-based offline deduplication and DedupKV outperform BlobDB by 33% and 23%, respectively, in write-heavy workloads, while reducing write amplification by 1.2 ×, 2 ×, and 1.6 × for real KV datasets.

Original languageEnglish
Title of host publicationACM ICS 2025 - Proceedings of the 39th ACM International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Pages580-595
Number of pages16
ISBN (Electronic)9798400715372
DOIs
StatePublished - Aug 22 2025
Event39th ACM International Conference on Supercomputing, ICS 2025 - Lake City, United States
Duration: Jun 8 2025Jun 11 2025

Publication series

NameProceedings of the International Conference on Supercomputing
VolumePart of 213821

Conference

Conference39th ACM International Conference on Supercomputing, ICS 2025
Country/TerritoryUnited States
CityLake City
Period06/8/2506/11/25

Funding

This work was partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2025-00564249 and RS-2024-00416666). This research also used resources of the Oak Ridge Leadership Computing Facility, located at the National Center for Computational Sciences at the Oak Ridge National Laboratory, which is supported by the Office of Science of the DOE under Contract DE-AC05-00OR22725. The work performed by HE at Temple University is partially supported by the US National Science Foundation under grant #OAC-2311758.

Keywords

  • Deduplication
  • Key-Value Store
  • Log-Structured Merge Tree

Fingerprint

Dive into the research topics of 'DEDUPKV: A Space-Efficient and High-Performance Key-Value Store via Fine-Grained Deduplication'. Together they form a unique fingerprint.

Cite this