Unlocking the Unusable: A Proactive Caching Framework for Reusing Partial Overlapped Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cache systems are widely used to speed up data retrieving. Modern HPC, data analytics, and AI/ML workloads generate vast, multi-dimensional datasets, and those data are accessed via complex queries. However, the probability of requesting the exact same data across different queries is low, leading to limited performance improvement when a traditional key-value cache is applied. In this paper, we present Mosaic-Cache, a proactive and general caching framework that enables applications with efficient partial overlapped data reuse through novel overlap-aware cache interfaces for fast content-level reuse. The core components include a metadata manager leveraging customizable indexing for fast overlap lookups, an adaptive fetch planner for dynamic cache-to-storage decisions, and an async merger to reduce cache fragmentation and redundancy. Evaluations on real-world HPC datasets show that Mosaic-Cache improves overall performance by up to 4.1× over traditional key-value-based cache while adding minimal overhead in worst-case scenarios.

Original languageEnglish
Title of host publicationHotStorage 2025 - Proceedings of the 2025 17th ACM Workshop on Hot Topics in Storage and File Systems
PublisherAssociation for Computing Machinery, Inc
Pages129-136
Number of pages8
ISBN (Electronic)9798400719479
DOIs
StatePublished - Jul 10 2025
Event17th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage 2025 - Boston, United States
Duration: Jul 10 2025Jul 11 2025

Publication series

NameHotStorage 2025 - Proceedings of the 2025 17th ACM Workshop on Hot Topics in Storage and File Systems

Conference

Conference17th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage 2025
Country/TerritoryUnited States
CityBoston
Period07/10/2507/11/25

Funding

We extend our sincere gratitude to the anonymous reviewers for their constructive feedback. We also acknowledge the members of the ASU-IDI Lab for their thoughtful comments and contributions. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program, under the RAPIDS Institute . This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was partially funded by the National Science Foundation under Grant Number #2412436 and #2443219.

Keywords

  • Cache Framework
  • Partial Overlapped Data
  • Proactive Caching

Fingerprint

Dive into the research topics of 'Unlocking the Unusable: A Proactive Caching Framework for Reusing Partial Overlapped Data'. Together they form a unique fingerprint.

Cite this