Crocus: Enabling Computing Resource Orchestration for Inline Cluster-Wide Deduplication on Scalable Storage Systems

Prince Hamandawana, Awais Khan, Chang Gyu Lee, Sungyong Park, Youngjae Kim

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Inline deduplication dramatically improves storage space utilization. However, it degrades I/O throughput due to compute-intensive deduplication operations such as chunking, fingerprinting or hashing of chunk content, and redundant lookup I/Os over the network in the I/O path. In particular, the fingerprint or hash generation of content contributes largely to the degraded I/O throughput and is computationally expensive. In this article, we propose Crocus, a framework that enables compute resource orchestration to enhance cluster-wide deduplication performance. In particular, Crocus takes into account all compute resources such as local and remote {CPU, GPU} by managing decentralized compute pools. An opportunistic Load-Aware Fingerprint Scheduler (LAFS), distributes and offloads compute-intensive deduplication operations in a load-aware fashion to compute pools. Crocus is highly generic and can be adopted in both inline and offline deduplication with different storage tier configurations. We implemented Crocus in Ceph scale-out storage system. Our extensive evaluation shows that Crocus reduces the fingerprinting overhead by 86 percent with 4KB chunk size compared to Ceph with baseline deduplication while maintaining high disk-space savings. Our proposed LAFS scheduler, when tested in different internal and external contention scenarios also showed 54 percent improvement over a fixed or static scheduling approach.

Original languageEnglish
Article number8993857
Pages (from-to)1740-1753
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume31
Issue number8
DOIs
StatePublished - Aug 1 2020
Externally publishedYes

Funding

This work was supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea Government (Ministry of Science and ICT) under Grant NRF-2018R1A1A1A05079398.

FundersFunder number
Ministry of Science, ICT and Future PlanningNRF-2018R1A1A1A05079398
National Research Foundation of Korea

    Keywords

    • Distributed file systems
    • scheduling
    • storage management

    Fingerprint

    Dive into the research topics of 'Crocus: Enabling Computing Resource Orchestration for Inline Cluster-Wide Deduplication on Scalable Storage Systems'. Together they form a unique fingerprint.

    Cite this