Early experiences on the OLCF Frontier system with AthenaPK and Parthenon-Hydro

John K. Holmen, Philipp Grete, Verónica G. Melesse Vergara

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The Oak Ridge Leadership Computing Facility (OLCF) has been preparing the nation's first exascale system, Frontier, for production and end users. Frontier is based on HPE Cray's new EX architecture and Slingshot interconnect and features 74 cabinets of optimized 3rd Gen AMD EPYC CPUs for HPC and AI and AMD Instinct 250X accelerators. As a part of this preparation, “real-world” user codes have been selected to help assess the functionality, performance, and usability of the system. This article describes early experiences using the system in collaboration with the Hamburg Observatory for two selected codes, which have since been adopted in the OLCF test harness. Experiences discussed include efforts to resolve performance variability and per-cycle slowdowns. Results are shown for a performance portable astrophysical magnetohydronamics code, AthenaPK, and a mini-application stressing the core functionality of a performance portable block-structured adaptive mesh refinement framework, Parthenon-Hydro. These results show good scaling characteristics to the full system. At the largest scale, the Parthenon-Hydro miniapp reaches a total of (Formula presented.) zone-cycles/s on 9216 nodes (73,728 logical GPUs) at (Formula presented.) 92% weak scaling parallel efficiency (starting from a single node using a second-order, finite-volume method).

Original languageEnglish
Article numbere8069
JournalConcurrency and Computation: Practice and Experience
Volume36
Issue number13
DOIs
StatePublished - Jun 10 2024

Funding

The authors would like to thank the Parthenon and Kokkos communities for being open, approachable and supportive. The authors would also like the thank the OLCF for early access to Frontier. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE‐AC05‐00OR22725. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska‐Curie Grant agreement No. 101030214.

FundersFunder number
Office of ScienceDE‐AC05‐00OR22725
Horizon 2020101030214

    Keywords

    • adaptive mesh refinement
    • high-performance computing
    • parallel computing
    • performance portability

    Fingerprint

    Dive into the research topics of 'Early experiences on the OLCF Frontier system with AthenaPK and Parthenon-Hydro'. Together they form a unique fingerprint.

    Cite this