TY - GEN
T1 - Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations
AU - Martin, Aristotle
AU - Liu, Geng
AU - Ladd, William
AU - Lee, Seyong
AU - Gounley, John
AU - Vetter, Jeffrey
AU - Patel, Saumil
AU - Rizzi, Silvio
AU - Mateevitsi, Victor
AU - Insley, Joseph
AU - Randles, Amanda
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/12
Y1 - 2023/11/12
N2 - Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with a range of backends, using a combination of automated tools and manual tuning. We use a proxy application along with a custom performance model to inform the results and identify additional optimization strategies. At scale performance of the programming model implementations are evaluated on pre-production GPU node architectures for Frontier and Aurora, as well as on current NVIDIA device-based systems Summit and Polaris. Real-world workloads representing 3D blood flow calculations in complex vasculature are assessed. Our analysis highlights critical trade-offs between code performance, portability, and development time.
AB - Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with a range of backends, using a combination of automated tools and manual tuning. We use a proxy application along with a custom performance model to inform the results and identify additional optimization strategies. At scale performance of the programming model implementations are evaluated on pre-production GPU node architectures for Frontier and Aurora, as well as on current NVIDIA device-based systems Summit and Polaris. Real-world workloads representing 3D blood flow calculations in complex vasculature are assessed. Our analysis highlights critical trade-offs between code performance, portability, and development time.
KW - Computational fluid dynamics
KW - Performance portability
KW - Proxy applications
UR - http://www.scopus.com/inward/record.url?scp=85178118548&partnerID=8YFLogxK
U2 - 10.1145/3624062.3624188
DO - 10.1145/3624062.3624188
M3 - Conference contribution
AN - SCOPUS:85178118548
T3 - ACM International Conference Proceeding Series
SP - 1126
EP - 1137
BT - Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PB - Association for Computing Machinery
T2 - 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Y2 - 12 November 2023 through 17 November 2023
ER -