Abstract
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-model simulations, necessitating precise physics resolution in localized areas across expansive domains. Today’s supercomputers’ extreme heterogeneity presents a significant challenge for dynamically adaptive codes, highlighting the importance of achieving performance portability at scale. Our research focuses on astrophysical simulations, particularly stellar mergers, to elucidate early universe dynamics. We present Octo-Tiger, leveraging Kokkos, HPX, and SIMD for portable performance at scale in complex, massively parallel adaptive multi-physics simulations. Octo-Tiger supports diverse processors, accelerators, and network backends. Experiments demonstrate exceptional scalability across several heterogeneous supercomputers including Perlmutter, Frontier, and Fugaku, encompassing major GPU architectures and x86, ARM, and RISC-V CPUs. Parallel efficiency of 47.59% (110,080 cores and 6880 hybrid A100 GPUs) on a full-system run on Perlmutter (26% HPCG peak performance) and 51.37% (using 32,768 cores and 2048 MI250X) on Frontier are achieved.
| Original language | English |
|---|---|
| Article number | 10943420251386503 |
| Journal | International Journal of High Performance Computing Applications |
| DOIs | |
| State | Accepted/In press - 2025 |
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation, 1927880, National Energy Research Scientific Computing Center; DDR-ERCAP0028472, Lawrence Berkeley National Laboratory DE-AC02-05CH11231, Office of Science of the U.S. Department of Energy DE-AC05-00OR22725, U.S. Department of Energy through the Los Alamos National Laboratory, National Nuclear Security Administration of U.S. Department of Energy 89233218CNA000001. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 using NERSC award DDR-ERCAP0028472. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used computational resources of the supercomputer Fugaku provided by RIKEN Center for Computational Science. This work was supported by the U.S. Department of Energy through the Los Alamos National Laboratory (LANL). LANL is operated by Triad National Security, LLC, for the National Nuclear Security Administration of U.S. Department of Energy (Contract No. 89233218CNA000001). We also thank the LANL Advanced Simulation & Computing Program and CCS-7 Darwin cluster for computational resources. The authors would like to thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the innovative high-performance Ookami computing system, which was made possible by a $5M National Science Foundation grant (#1927880). The support we received from the Center of Computation and Technology at Louisiana State University was invaluable. The authors also acknowledge the technical support we received from NVIDIA (Scot Halverson) in the early stages of the project. Assigned: LA-UR-24-23457 (Rev. 3).
Keywords
- HPX
- Kokkos
- adaptive mesh refinement
- asynchronous-many-task systems
- exascale computing
- high performance computing
- stellar merger