TY - GEN
T1 - Performance Porting the ExaStar Multi-Physics App Thornado On Heterogeneous Systems - A Fortran-OpenMP Code-Base Evaluation
AU - Thavappiragasam, Mathialakan
AU - Harris, J. Austin
AU - Endeve, Eirik
AU - Videau, Brice
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - The heterogeneity of HPC systems requires efficient host-to-device porting of compute kernels and high-bandwidth data communication. This capability varies from one system to another depending on system architectures and environments. New vendors such as AMD and Intel are entering the GPU field, creating a software portability challenge. Major scientific simulation code bases rely on Fortran and require portable programming models for performance porting to HPC systems with high software productivity. Even though OpenMP target offloading features support portability, most Fortran-OpenMP code bases face significant challenges. Hence, in this work, we motivated an evaluation of a) the computing capability of heterogeneous systems for Fortran-OpenMP-based multi-physics code bases, and b) the performance portability of the astrophysical supernova simulation code Flash-X on heterogeneous systems. For this study, three HPC systems were chosen: Sunspot, a test-bed system of the Intel-PVC GPU featured supercomputer Aurora and Polaris, an NVIDIA system accelerated by A100 GPU, both located at the Argonne Leadership Computing Facility (ALCF), and the AMD-MI250-based Frontier at the Oak Ridge Leadership Computing Facility (OLCF). We discuss challenges and solutions for performance porting the compute-intensive module Thornado, which can be incorporated as an external library in Flash-X to model neutrino transport. We show that the performance of test apps improved by approximately 24× using the relevant optimization strategies + compiler-and-system updates. Further, this study helped improve the intel OneAPI-OpenMP compiler by providing bug reports and reproducers internally.
AB - The heterogeneity of HPC systems requires efficient host-to-device porting of compute kernels and high-bandwidth data communication. This capability varies from one system to another depending on system architectures and environments. New vendors such as AMD and Intel are entering the GPU field, creating a software portability challenge. Major scientific simulation code bases rely on Fortran and require portable programming models for performance porting to HPC systems with high software productivity. Even though OpenMP target offloading features support portability, most Fortran-OpenMP code bases face significant challenges. Hence, in this work, we motivated an evaluation of a) the computing capability of heterogeneous systems for Fortran-OpenMP-based multi-physics code bases, and b) the performance portability of the astrophysical supernova simulation code Flash-X on heterogeneous systems. For this study, three HPC systems were chosen: Sunspot, a test-bed system of the Intel-PVC GPU featured supercomputer Aurora and Polaris, an NVIDIA system accelerated by A100 GPU, both located at the Argonne Leadership Computing Facility (ALCF), and the AMD-MI250-based Frontier at the Oak Ridge Leadership Computing Facility (OLCF). We discuss challenges and solutions for performance porting the compute-intensive module Thornado, which can be incorporated as an external library in Flash-X to model neutrino transport. We show that the performance of test apps improved by approximately 24× using the relevant optimization strategies + compiler-and-system updates. Further, this study helped improve the intel OneAPI-OpenMP compiler by providing bug reports and reproducers internally.
KW - Concurrent multi-tasking
KW - Fortran-OpenMP code base
KW - Intel’s PVC GPU
KW - Multi-physics toolkit
KW - OpenACC
KW - OpenMP target offloading
KW - Porting to heterogeneous systems
UR - http://www.scopus.com/inward/record.url?scp=85205390649&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-72567-8_2
DO - 10.1007/978-3-031-72567-8_2
M3 - Conference contribution
AN - SCOPUS:85205390649
SN - 9783031725661
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 30
BT - Advancing OpenMP for Future Accelerators - 20th International Workshop on OpenMP, IWOMP 2024, Proceedings
A2 - Espinosa, Alexis
A2 - Cytowski, Maciej
A2 - Klemm, Michael
A2 - de Supinski, Bronis R.
A2 - Klinkenberg, Jannis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 20th International Workshop on OpenMP, IWOMP 2024
Y2 - 23 September 2024 through 25 September 2024
ER -