DupHunter: Flexible high-performance deduplication for docker registries

Nannan Zhao, Hadeel Albahar, Subil Abraham, Keren Chen, Vasily Tarasov, Dimitrios Skourtis, Lukas Rupprecht, Ali Anwar, Ali R. Butt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

42 Scopus citations

Abstract

The rise of containers has led to a broad proliferation of container images. The associated storage performance and capacity requirements place high pressure on the infrastructure of container registries that store and serve images. Exploiting the high file redundancy in real-world container images is a promising approach to drastically reduce the demanding storage requirements of the growing registries. However, existing deduplication techniques significantly degrade the performance of registries because of the high layer restore overhead. We propose DupHunter, a new Docker registry architecture, which not only natively deduplicates layers for space savings but also reduces layer restore overhead. DupHunter supports several configurable deduplication modes, which provide different levels of storage efficiency, durability, and performance, to support a range of uses. To mitigate the negative impact of deduplication on the image download times, DupHunter introduces a two-tier storage hierarchy with a novel layer prefetch/preconstruct cache algorithm based on user access patterns. Under real workloads, in the highest data reduction mode, DupHunter reduces storage space by up to 6.9× compared to the current implementations. In the highest performance mode, DupHunter can reduce the GET layer latency up to 2.8× compared to the state of the art.

Original languageEnglish
Title of host publicationProceedings of the 2020 USENIX Annual Technical Conference, ATC 2020
PublisherUSENIX Association
Pages769-783
Number of pages15
ISBN (Electronic)9781939133144
StatePublished - 2020
Externally publishedYes
Event2020 USENIX Annual Technical Conference, ATC 2020 - Virtual, Online
Duration: Jul 15 2020Jul 17 2020

Publication series

NameProceedings of the 2020 USENIX Annual Technical Conference, ATC 2020

Conference

Conference2020 USENIX Annual Technical Conference, ATC 2020
CityVirtual, Online
Period07/15/2007/17/20

Funding

We are thankful to the anonymous reviewers and our shepherd Abhinav Duggal for their valuable feedback. This work is sponsored in part by the National Science Foundation under grants CCF-1919113, CNS-1405697, CNS-1615411, and OAC-2004751.

Fingerprint

Dive into the research topics of 'DupHunter: Flexible high-performance deduplication for docker registries'. Together they form a unique fingerprint.

Cite this