Abstract
The OpenMP specification recently introduced support for unified shared memory, allowing implementation to leverage underlying system software to provide a simpler GPU offloading model where explicit mapping of variables is optional. Support for this feature is becoming more available in different OpenMP implementations on several hardware platforms. A deeper understanding of the different implementation’s execution profile and performance is crucial for applications as they consider the performance portability implications of adopting a unified memory offloading programming style. This work introduces a benchmark tool to characterize unified memory support in several OepnMP compilers and runtimes, with emphasis on identifying discrepancies between different OpenMP implementations as to how they various memory allocation strategies interact with unified shared memory. The benchmark tool is used to characterize OpenMP compilers on three leading High Performance Computing platforms supporting different CPU and device architectures. The benchmark tool is used to assess the impact of enabling unified shared memory on the performance of memory-bound code, highlighting implementation differences that should be accounted for when applications consider performance portability across platforms and compilers.
Original language | English |
---|---|
Title of host publication | OpenMP |
Subtitle of host publication | Advanced Task-Based, Device and Compiler Programming - 19th International Workshop on OpenMP, IWOMP 2023, Proceedings |
Editors | Simon McIntosh-Smith, Tom Deakin, Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 210-225 |
Number of pages | 16 |
ISBN (Print) | 9783031407437 |
DOIs | |
State | Published - 2023 |
Event | Proceedings of the 19th International Workshop on OpenMP, IWOMP 2023 - Bristol, United Kingdom Duration: Sep 13 2023 → Sep 15 2023 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14114 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Proceedings of the 19th International Workshop on OpenMP, IWOMP 2023 |
---|---|
Country/Territory | United Kingdom |
City | Bristol |
Period | 09/13/23 → 09/15/23 |
Funding
Acknowledgement. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative. The views and opinions of the authors do not necessarily reflect those of the U.S. government or Lawrence Livermore National Security, LLC neither of whom nor any of their employees make any endorsements, express or implied warranties or representations or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of the information contained herein. This work was in parts prepared by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-827970). We also gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation at Argonne National Laboratory. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, in particular its subproject SOLLVE. Acknowledgements. This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative, and the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computer Research, under Contract DE-AC02-06CH11357. Acknowledgments. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a US Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231. Acknowledgments. This research was supported by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev, Israel; Intel Corporation (oneAPI CoE program); and the Lynn and William Frankel Center for Computer Science. Computational support was provided by the NegevHPC project [5] and Intel Developer Cloud [26]. The authors thank Re’em Harel, Israel Hen, and Gabi Dadush for their help and support. Acknowledgment. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. We acknowledge the Danish e-Infrastructure Cooperation (DeiC), Denmark, for awarding this project access to the LUMI supercomputer, owned by the EuroHPC Joint Undertaking, hosted by CSC (Finland) and the LUMI consortium through DeiC, Denmark, Compiler development (DeiC-DTU-N5-20230033). Lastly, we acknowledge DCC [4] for providing access to compute resources. Acknowledgement. Prepared by LLNL under Contract DE-AC52-07NA27344 (LL NL-CONF-849438) and supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research. Acknowledgement. This work was partially supported by DeiC National HPC (g.a. DeiC-DTU-N5-20230033) and by the “Compiler development” project (g.a. DeiC-DTU-N5-20230033). This work is supported by the Sao Paulo Research Foundation (grants 18/07446-8, 20/01665-0, and 18/15519-5).
Keywords
- Offloading
- OpenMP
- Unified Shared Memory