Abstract
As the supercomputing landscape diversifies, solutions such as Kokkos to write vendor agnostic applications and libraries have risen in popularity. Kokkos provides a programming model designed for performance portability, which allows developers to write a single source implementation that can run efficiently on various architectures. At its heart, Kokkos maps parallel algorithms to architecture and vendor specific backends written in lower level programming models such as CUDA and HIP. Another approach to writing vendor agnostic parallel code is using OpenMP’s directives based approach, which lets developers annotate code to express parallelism. It is implemented at the compiler level and is supported by all major high performance computing vendors, as well as the primary Open Source toolchains GNU and LLVM. Since its inception, Kokkos has used OpenMP to parallelize on CPU architectures. In this paper, we explore leveraging OpenMP for a GPU backend and discuss the challenges we encountered when mapping the Kokkos APIs and semantics to OpenMP target constructs. As an exemplar workload we chose a simple conjugate gradient solver for sparse matrices. We find that performance on NVIDIA and AMD GPUs varies widely based on details of the implementation strategy and the chosen compiler. Furthermore, the performance of the OpenMP implementations decreases with increasing complexity of the investigated algorithms.
Original language | English |
---|---|
Title of host publication | OpenMP |
Subtitle of host publication | Advanced Task-Based, Device and Compiler Programming - 19th International Workshop on OpenMP, IWOMP 2023, Proceedings |
Editors | Simon McIntosh-Smith, Tom Deakin, Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 99-113 |
Number of pages | 15 |
ISBN (Print) | 9783031407437 |
DOIs | |
State | Published - 2023 |
Event | Proceedings of the 19th International Workshop on OpenMP, IWOMP 2023 - Bristol, United Kingdom Duration: Sep 13 2023 → Sep 15 2023 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14114 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Proceedings of the 19th International Workshop on OpenMP, IWOMP 2023 |
---|---|
Country/Territory | United Kingdom |
City | Bristol |
Period | 09/13/23 → 09/15/23 |
Funding
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This written work is authored by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to the written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan. This work was supported by Exascale Computing Project 17-SC-20-SC, a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. This research used resources of the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Acknowledgement. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative. The views and opinions of the authors do not necessarily reflect those of the U.S. government or Lawrence Livermore National Security, LLC neither of whom nor any of their employees make any endorsements, express or implied warranties or representations or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of the information contained herein. This work was in parts prepared by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-827970). We also gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation at Argonne National Laboratory. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, in particular its subproject SOLLVE. Acknowledgements. This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative, and the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computer Research, under Contract DE-AC02-06CH11357. Acknowledgments. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a US Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231. Acknowledgments. This research was supported by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev, Israel; Intel Corporation (oneAPI CoE program); and the Lynn and William Frankel Center for Computer Science. Computational support was provided by the NegevHPC project [5] and Intel Developer Cloud [26]. The authors thank Re’em Harel, Israel Hen, and Gabi Dadush for their help and support. Acknowledgment. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. We acknowledge the Danish e-Infrastructure Cooperation (DeiC), Denmark, for awarding this project access to the LUMI supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through DeiC, Denmark, Compiler development (DeiC-DTU-N5-20230033). Lastly, we acknowledge DCC [4] for providing access to compute resources. Acknowledgement. Prepared by LLNL under Contract DE-AC52-07NA27344 (LL NL-CONF-849438) and supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research. Acknowledgement. This work was partially supported by DeiC National HPC (g.a. DeiC-DTU-N5-20230033) and by the “Compiler development” project (g.a. DeiC-DTU-N5-20230033). This work is supported by the Sao Paulo Research Foundation (grants 18/07446-8, 20/01665-0, and 18/15519-5).
Funders | Funder number |
---|---|
Data Science Research Center | |
DeiC National HPC | DeiC-DTU-N5-20230033 |
European High-Performance Computing Joint Undertaking | 951732 |
Intel Developer Cloud | |
Lynn and William Frankel Center for Computer Science | |
Office of Advanced Scientific Computer Research | DE-AC02-06CH11357 |
Office of Science and National Nuclear Security Administration | |
U.S. Department of Energy organizations | |
U.S. Government | |
U.S. Department of Energy | 17-SC-20-SC |
U.S. Department of Energy | |
Division of Chemistry | |
Intel Corporation | |
Office of Science | DE-AC05-00OR22725 |
Office of Science | |
National Nuclear Security Administration | DE-NA-0003525 |
National Nuclear Security Administration | |
Advanced Scientific Computing Research | |
Lawrence Livermore National Laboratory | LLNL-CONF-827970, DE-AC52-07NA27344, LL NL-CONF-849438 |
Lawrence Livermore National Laboratory | |
Lawrence Berkeley National Laboratory | DE-AC02-05CH11231 |
Lawrence Berkeley National Laboratory | |
Fundação de Amparo à Pesquisa do Estado de São Paulo | 20/01665-0, 18/07446-8, 18/15519-5 |
Fundação de Amparo à Pesquisa do Estado de São Paulo | |
Ben-Gurion University of the Negev | |
Council for Higher Education |
Keywords
- GPUs
- Kokkos
- OpenMP
- parallel programming
- performance portability