Abstract
GPU accelerators are ubiquitous, but their ecosystem is far less evolved than the host one. Compiler heuristics are often tuned for CPUs and reused for GPU. Similarly, tooling and more evolved optimization techniques are historically not available on GPU targets. In this work, we address one of these shortcomings and enable profile generation and profile-guided optimizations (PGO) for GPU targets. While this is only a single step towards a CPU equivalent ecosystem for offload devices, it shows how old misconceptions on the limitations of GPUs are often not warranted anymore. Through our implementation in LLVM/Offload, we enable device-side PGO for full scientific applications and open up tooling opportunities, including code coverage analysis and compiler-built-in roofline analysis. Our evaluation highlights the performance implications of profile generation, the insights gained from these profiles, and the (missed) opportunities in utilizing the information for GPU compilation.
| Original language | English |
|---|---|
| Title of host publication | OpenMP |
| Subtitle of host publication | Balancing Productivity and Performance Portability - 21st International Workshop on OpenMP, IWOMP 2025, Proceedings |
| Editors | Yonghong Yan, Erik Saule, Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg, Swaroop Pophale |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 99-113 |
| Number of pages | 15 |
| ISBN (Print) | 9783032063427 |
| DOIs | |
| State | Published - 2026 |
| Event | 21st International Workshop on OpenMP, IWOMP 2025 - Charlotte, United States Duration: Oct 1 2025 → Oct 3 2025 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 16123 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 21st International Workshop on OpenMP, IWOMP 2025 |
|---|---|
| Country/Territory | United States |
| City | Charlotte |
| Period | 10/1/25 → 10/3/25 |
Funding
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-2009045). This manuscript has been partially coauthored by Lawrence Livermore National Security, LLC under Contract No. DEAC52-07NA27344 with the US. Department of Energy. The United States Government retains, and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. This work was supported in part by the Advanced Simulation and Computing Program operated by the National Nuclear Security Administration. This research used resources of the Experimental Computing Laboratory (ExCL) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- GPU
- Offloading
- PGO
- Profiling
- Roofline