Abstract
Preparing for the deployment of large scientific and engineering codes on upcoming exascale systems with GPU-dense nodes is made challenging by the unprecedented diversity of device architectures and heterogeneous programming models. In this work, we evaluate the process of porting a massively parallel, fluid dynamics code written in CUDA to SYCL, HIP, and Kokkos with a range of backends, using a combination of automated tools and manual tuning. We use a proxy application along with a custom performance model to inform the results and identify additional optimization strategies. At scale performance of the programming model implementations are evaluated on pre-production GPU node architectures for Frontier and Aurora, as well as on current NVIDIA device-based systems Summit and Polaris. Real-world workloads representing 3D blood flow calculations in complex vasculature are assessed. Our analysis highlights critical trade-offs between code performance, portability, and development time.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
| Publisher | Association for Computing Machinery |
| Pages | 1126-1137 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798400707858 |
| DOIs | |
| State | Published - Nov 12 2023 |
| Event | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States Duration: Nov 12 2023 → Nov 17 2023 |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
|---|---|
| Country/Territory | United States |
| City | Denver |
| Period | 11/12/23 → 11/17/23 |
Funding
Research reported in this work was supported by the National Institutes of Health under Award Numbers U01CA253511 and T32GM144291, and the ALCF Aurora Early Science Program. The content does not necessarily represent the official views of the NIH. This research used resources from the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported by the DE-AC02-06CH11357 Contract. An award of compute time was provided by the INCITE program. This research used resources from the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.
Keywords
- Computational fluid dynamics
- Performance portability
- Proxy applications