CAESAR: A Unified Framework for Foundation and Generative Models for Efficient Compression of Scientific Data

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce CAESAR, a new framework for scientific data reduction that stands for Conditional AutoEncoder with Super-resolution for Augmented Reduction. The baseline model, CAESAR-V, is built on a standard variational autoencoder with scale hyperpriors and super-resolution modules to achieve high compression. It encodes data into a latent space and uses learned priors for compact, information-rich representations. The enhanced version, CAESAR-D, begins by compressing keyframes using an autoencoder and extends the architecture by incorporating conditional diffusion to interpolate the latent spaces of missing frames between keyframes. This enables high-fidelity reconstruction of intermediate data without requiring their explicit storage. By distinguishing CAESAR-V (variational) from CAESAR-D (diffusion-enhanced), we offer a modular family of solutions that balance compression efficiency, reconstruction accuracy, and computational cost for scientific data workflows. Additionally, we develop a GPU-accelerated postprocessing module which enforces error bounds on the reconstructed data, achieving real-time compression while maintaining rigorous accuracy guarantees. Experimental results across multiple scientific datasets demonstrate that our framework achieves up to 10× higher compression ratios compared to rule-based compressors such as SZ3. This work provides a scalable, domain-adaptive solution for efficient storage and transmission of large-scale scientific simulation data.

Original languageEnglish
Article number8977
JournalApplied Sciences (Switzerland)
Volume15
Issue number16
DOIs
StatePublished - Aug 2025

Funding

This research was funded by the U.S. Department of Energy under Grant Nos. DE-SC0021320 and DE-SC0022265.

Keywords

  • error bound guarantees
  • foundation model
  • generative AI
  • machine learning
  • scientific data reduction

Fingerprint

Dive into the research topics of 'CAESAR: A Unified Framework for Foundation and Generative Models for Efficient Compression of Scientific Data'. Together they form a unique fingerprint.

Cite this