Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes

William F. Godoy, Pedro Valero-Lara, T. Elise Dettling, Christian Trefftz, Ian Jorquera, Thomas Sheehy, Ross G. Miller, Marc Gonzalez-Tallada, Jeffrey S. Vetter, Valentin Churavy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We explore the performance and portability of the high-level programming models: the LLVM-based Julia and Python/Numba, and Kokkos on high-performance computing (HPC) nodes: AMD Epyc CPUs and MI250X graphical processing units (GPUs) on Frontier's test bed Crusher system and Ampere's Arm-based CPUs and NVIDIA's A100 GPUs on the Wombat system at the Oak Ridge Leadership Computing Facilities. We compare the default performance of a hand-rolled dense matrix multiplication algorithm on CPUs against vendor-compiled C/OpenMP implementations, and on each GPU against CUDA and HIP. Rather than focusing on the kernel optimization per-se, we select this naive approach to resemble exploratory work in science and as a lower-bound for performance to isolate the effect of each programming model. Julia and Kokkos perform comparably with C/OpenMP on CPUs, while Julia implementations are competitive with CUDA and HIP on GPUs. Performance gaps are identified on NVIDIA A100 GPUs for Julia's single precision and Kokkos, and for Python/Numba in all scenarios. We also comment on half-precision support, productivity, performance portability metrics, and platform readiness. We expect to contribute to the understanding and direction for high-level, high-productivity languages in HPC as the first-generation exascale systems are deployed.

Original languageEnglish
Title of host publication2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages373-382
Number of pages10
ISBN (Electronic)9798350311990
DOIs
StatePublished - 2023
Event2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023 - St. Petersburg, United States
Duration: May 15 2023May 19 2023

Publication series

Name2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023

Conference

Conference2023 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2023
Country/TerritoryUnited States
CitySt. Petersburg
Period05/15/2305/19/23

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaboratvi e effort of the US Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (https://energy.gov/downloads/ doe-public-access-plan).

FundersFunder number
DOE Public Access Plan
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science
National Nuclear Security Administration

    Keywords

    • Exascale
    • GPU
    • HPC
    • Julia
    • Kokkos
    • LLVM
    • OpenMP
    • Performance
    • Portability
    • Python/Numba

    Fingerprint

    Dive into the research topics of 'Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes'. Together they form a unique fingerprint.

    Cite this