Abstract
Cache systems are widely used to speed up data retrieving. Modern HPC, data analytics, and AI/ML workloads generate vast, multi-dimensional datasets, and those data are accessed via complex queries. However, the probability of requesting the exact same data across different queries is low, leading to limited performance improvement when a traditional key-value cache is applied. In this paper, we present Mosaic-Cache, a proactive and general caching framework that enables applications with efficient partial overlapped data reuse through novel overlap-aware cache interfaces for fast content-level reuse. The core components include a metadata manager leveraging customizable indexing for fast overlap lookups, an adaptive fetch planner for dynamic cache-to-storage decisions, and an async merger to reduce cache fragmentation and redundancy. Evaluations on real-world HPC datasets show that Mosaic-Cache improves overall performance by up to 4.1× over traditional key-value-based cache while adding minimal overhead in worst-case scenarios.
| Original language | English |
|---|---|
| Title of host publication | HotStorage 2025 - Proceedings of the 2025 17th ACM Workshop on Hot Topics in Storage and File Systems |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 129-136 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798400719479 |
| DOIs | |
| State | Published - Jul 10 2025 |
| Event | 17th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage 2025 - Boston, United States Duration: Jul 10 2025 → Jul 11 2025 |
Publication series
| Name | HotStorage 2025 - Proceedings of the 2025 17th ACM Workshop on Hot Topics in Storage and File Systems |
|---|
Conference
| Conference | 17th ACM Workshop on Hot Topics in Storage and File Systems, HotStorage 2025 |
|---|---|
| Country/Territory | United States |
| City | Boston |
| Period | 07/10/25 → 07/11/25 |
Funding
We extend our sincere gratitude to the anonymous reviewers for their constructive feedback. We also acknowledge the members of the ASU-IDI Lab for their thoughtful comments and contributions. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program, under the RAPIDS Institute . This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was partially funded by the National Science Foundation under Grant Number #2412436 and #2443219.
Keywords
- Cache Framework
- Partial Overlapped Data
- Proactive Caching