Error-Bounded Learned Scientific Data Compression with Preservation of Derived Quantities

Jaemoon Lee, Qian Gong, Jong Choi, Tania Banerjee, Scott Klasky, Sanjay Ranka, Anand Rangarajan

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Scientific applications continue to grow and produce extremely large amounts of data, which require efficient compression algorithms for long-term storage. Compression errors in scientific applications can have a deleterious impact on downstream processing. Thus, it is crucial to preserve all the “known” Quantities of Interest (QoI) during compression. To address this issue, most existing approaches guarantee the reconstruction error of the original data or primary data (PD), but cannot directly control the problem of preserving the QoI. In this work, we propose a physics-informed compression technique that is composed of two parts: (i) reduction of the PD with bounded errors and (ii) preservation of the QoI. In the first step, we combine tensor decompositions, autoencoders, product quantizers, and error-bounded lossy compressors to bound the reconstruction error at high levels of compression. In the second step, we use constraint satisfaction post-processing followed by quantization to preserve the QoI. To illustrate the challenges of reducing the reconstruction errors of the PD and QoI, we focus on simulation data generated by a large-scale fusion code, XGC, which can produce tens of petabytes in a single day. The results show that our approach can achieve a high compression amount while accurately preserving the QoI within scientifically acceptable bounds.

Original languageEnglish
Article number6718
JournalApplied Sciences (Switzerland)
Volume12
Issue number13
DOIs
StatePublished - Jul 1 2022

Funding

Funding: This research was partially supported by DOE DE-SC0022265 and DOE DE-SC0021320 RAPIDS2. Acknowledgments: The authors acknowledge the DOE (Grant No. DE-SC0022265) and DOE RAPIDS2 (Grant No. DE-SC0021320) for funding this project.

FundersFunder number
DOE RAPIDS2DE-SC0021320
U.S. Department of EnergyDE-SC0022265, DE-SC0021320 RAPIDS2

    Keywords

    • autoencoders
    • constraint satisfaction
    • data compression
    • error guarantees
    • fusion application
    • moment preservation
    • quantization

    Fingerprint

    Dive into the research topics of 'Error-Bounded Learned Scientific Data Compression with Preservation of Derived Quantities'. Together they form a unique fingerprint.

    Cite this