The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned

Rahulkumar Gayatri, Stephen L. Olivier, Christian R. Trott, Johannes Doerfert, Jan Ciesko, Damien Lebrun-Grandie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

As the supercomputing landscape diversifies, solutions such as Kokkos to write vendor agnostic applications and libraries have risen in popularity. Kokkos provides a programming model designed for performance portability, which allows developers to write a single source implementation that can run efficiently on various architectures. At its heart, Kokkos maps parallel algorithms to architecture and vendor specific backends written in lower level programming models such as CUDA and HIP. Another approach to writing vendor agnostic parallel code is using OpenMP’s directives based approach, which lets developers annotate code to express parallelism. It is implemented at the compiler level and is supported by all major high performance computing vendors, as well as the primary Open Source toolchains GNU and LLVM. Since its inception, Kokkos has used OpenMP to parallelize on CPU architectures. In this paper, we explore leveraging OpenMP for a GPU backend and discuss the challenges we encountered when mapping the Kokkos APIs and semantics to OpenMP target constructs. As an exemplar workload we chose a simple conjugate gradient solver for sparse matrices. We find that performance on NVIDIA and AMD GPUs varies widely based on details of the implementation strategy and the chosen compiler. Furthermore, the performance of the OpenMP implementations decreases with increasing complexity of the investigated algorithms.

Original languageEnglish
Title of host publicationOpenMP
Subtitle of host publicationAdvanced Task-Based, Device and Compiler Programming - 19th International Workshop on OpenMP, IWOMP 2023, Proceedings
EditorsSimon McIntosh-Smith, Tom Deakin, Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg
PublisherSpringer Science and Business Media Deutschland GmbH
Pages99-113
Number of pages15
ISBN (Print)9783031407437
DOIs
StatePublished - 2023
EventProceedings of the 19th International Workshop on OpenMP, IWOMP 2023 - Bristol, United Kingdom
Duration: Sep 13 2023Sep 15 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14114 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceProceedings of the 19th International Workshop on OpenMP, IWOMP 2023
Country/TerritoryUnited Kingdom
CityBristol
Period09/13/2309/15/23

Funding

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This written work is authored by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to the written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan. This work was supported by Exascale Computing Project 17-SC-20-SC, a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. This research used resources of the National Energy Research Scientific Computing Center (NERSC), which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231, and the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Acknowledgement. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative. The views and opinions of the authors do not necessarily reflect those of the U.S. government or Lawrence Livermore National Security, LLC neither of whom nor any of their employees make any endorsements, express or implied warranties or representations or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of the information contained herein. This work was in parts prepared by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-827970). We also gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation at Argonne National Laboratory. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, in particular its subproject SOLLVE. Acknowledgements. This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative, and the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computer Research, under Contract DE-AC02-06CH11357. Acknowledgments. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a US Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231. Acknowledgments. This research was supported by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev, Israel; Intel Corporation (oneAPI CoE program); and the Lynn and William Frankel Center for Computer Science. Computational support was provided by the NegevHPC project [5] and Intel Developer Cloud [26]. The authors thank Re’em Harel, Israel Hen, and Gabi Dadush for their help and support. Acknowledgment. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. We acknowledge the Danish e-Infrastructure Cooperation (DeiC), Denmark, for awarding this project access to the LUMI supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through DeiC, Denmark, Compiler development (DeiC-DTU-N5-20230033). Lastly, we acknowledge DCC [4] for providing access to compute resources. Acknowledgement. Prepared by LLNL under Contract DE-AC52-07NA27344 (LL NL-CONF-849438) and supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research. Acknowledgement. This work was partially supported by DeiC National HPC (g.a. DeiC-DTU-N5-20230033) and by the “Compiler development” project (g.a. DeiC-DTU-N5-20230033). This work is supported by the Sao Paulo Research Foundation (grants 18/07446-8, 20/01665-0, and 18/15519-5).

FundersFunder number
Data Science Research Center
DeiC National HPCDeiC-DTU-N5-20230033
European High-Performance Computing Joint Undertaking951732
Intel Developer Cloud
Lynn and William Frankel Center for Computer Science
Office of Advanced Scientific Computer ResearchDE-AC02-06CH11357
Office of Science and National Nuclear Security Administration
U.S. Department of Energy organizations
U.S. Government
U.S. Department of Energy17-SC-20-SC
U.S. Department of Energy
Division of Chemistry
Intel Corporation
Office of ScienceDE-AC05-00OR22725
Office of Science
National Nuclear Security AdministrationDE-NA-0003525
National Nuclear Security Administration
Advanced Scientific Computing Research
Lawrence Livermore National LaboratoryLLNL-CONF-827970, DE-AC52-07NA27344, LL NL-CONF-849438
Lawrence Livermore National Laboratory
Lawrence Berkeley National LaboratoryDE-AC02-05CH11231
Lawrence Berkeley National Laboratory
Fundação de Amparo à Pesquisa do Estado de São Paulo20/01665-0, 18/07446-8, 18/15519-5
Fundação de Amparo à Pesquisa do Estado de São Paulo
Ben-Gurion University of the Negev
Council for Higher Education

    Keywords

    • GPUs
    • Kokkos
    • OpenMP
    • parallel programming
    • performance portability

    Fingerprint

    Dive into the research topics of 'The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned'. Together they form a unique fingerprint.

    Cite this