Abstract
We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We compare the default performance of a hand-rolled dense matrix multiplication algorithm on CPUs against vendor-compiled C/OpenMP implementations, and on each GPU against CUDA and HIP. Rather than focusing on the kernel optimization per-se, we select this naive approach to resemble exploratory work in science and as a lower-bound for performance to isolate the effect of each programming model. Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with CUDA and HIP on GPUs. Performance gaps are identified on NVIDIA A100 GPUs for Julia's single precision and Kokkos, and for Python/Numba in all scenarios. We also comment on half-precision support, productivity, performance portability metrics, and platform readiness. We expect to contribute to the understanding and direction for high-level, high-productivity languages in HPC as the first-generation exascale systems are deployed.
Original language | English |
---|---|
Title of host publication | 2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 373-382 |
Number of pages | 10 |
ISBN (Electronic) | 9798350311990 |
DOIs | |
State | Published - 2023 |
Event | 2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 - St. Petersburg, United States Duration: May 15 2023 → May 19 2023 |
Publication series
Name | 2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 |
---|
Conference
Conference | 2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 |
---|---|
Country/Territory | United States |
City | St. Petersburg |
Period | 05/15/23 → 05/19/23 |
Funding
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaboratvi e effort of the US Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (https://energy.gov/downloads/ doe-public-access-plan).
Keywords
- Exascale
- GPU
- HPC
- Julia
- Kokkos
- LLVM
- OpenMP
- Performance
- Portability
- Python/Numba