Stability-preserving Lossy Compression for Large-scale Partial Differential Equations

  • Qian Gong
  • , Mark Ainsworth
  • , Jieyang Chen
  • , Xin Liang
  • , Liangji Zhu
  • , Ethan Klasky
  • , Tushar Athawale
  • , Qing Liu
  • , Anand Rangarajan
  • , Sanjay Ranka
  • , Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Checkpoint/Restart (C/R) strategies are vital for fault tolerance in PDE-based scientific simulations, yet traditional checkpointing incurs significant I/O overhead. Lossy compression offers a scalable solution by reducing checkpoint data size, but conventional methods often lack control over physical invariants (e.g., energy), leading to instability such as oscillations or divergence in Partial Differential Equations (PDE) systems. This paper introduces a stability-preserving compression approach tailored for PDE simulations by explicitly controlling kinetic and potential energy perturbations to ensure stable restarts. Extensive experiments conducted across diverse PDE configurations demonstrate that our method maintains numerical stability with minimal error magnification-even across multiple checkpoint-restart cycles-outperforming state-of-the-art lossy compressors. Parallel evaluations on the Frontier supercomputer show up to 8.4× improvement in checkpoint write performance and 6.3× in read performance, while maintaining relative L2 errors ∼2e-6 throughout continued simulation. These results provide practical guidance for balancing compression accuracy, stability, and computational efficiency in large-scale PDE applications.

Original languageEnglish
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
PublisherAssociation for Computing Machinery, Inc
Pages1992-2005
Number of pages14
ISBN (Electronic)9798400714665
DOIs
StatePublished - Nov 15 2025
Event2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 - St. Louis, United States
Duration: Nov 16 2025Nov 21 2025

Publication series

NameProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025

Conference

Conference2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
Country/TerritoryUnited States
CitySt. Louis
Period11/16/2511/21/25

Funding

The research is supported in part by the U.S. Department of Energy (DOE) RAPIDS-2 SciDAC and Sirius2 projects under contract number DE-AC05-00OR22725, and National Science Foundation (NSF) under the grants DMS-2324364, OAC-2313122, OAC-2311756, OAC-2311757 and OAC-2144403. This research used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility.

Keywords

  • Checkpoint-restart
  • large-scale PDEs
  • lossy compression
  • stability preservation

Fingerprint

Dive into the research topics of 'Stability-preserving Lossy Compression for Large-scale Partial Differential Equations'. Together they form a unique fingerprint.

Cite this