Profile Generation for GPU Targets

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

GPU accelerators are ubiquitous, but their ecosystem is far less evolved than the host one. Compiler heuristics are often tuned for CPUs and reused for GPU. Similarly, tooling and more evolved optimization techniques are historically not available on GPU targets. In this work, we address one of these shortcomings and enable profile generation and profile-guided optimizations (PGO) for GPU targets. While this is only a single step towards a CPU equivalent ecosystem for offload devices, it shows how old misconceptions on the limitations of GPUs are often not warranted anymore. Through our implementation in LLVM/Offload, we enable device-side PGO for full scientific applications and open up tooling opportunities, including code coverage analysis and compiler-built-in roofline analysis. Our evaluation highlights the performance implications of profile generation, the insights gained from these profiles, and the (missed) opportunities in utilizing the information for GPU compilation.

Original languageEnglish
Title of host publicationOpenMP
Subtitle of host publicationBalancing Productivity and Performance Portability - 21st International Workshop on OpenMP, IWOMP 2025, Proceedings
EditorsYonghong Yan, Erik Saule, Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg, Swaroop Pophale
PublisherSpringer Science and Business Media Deutschland GmbH
Pages99-113
Number of pages15
ISBN (Print)9783032063427
DOIs
StatePublished - 2026
Event21st International Workshop on OpenMP, IWOMP 2025 - Charlotte, United States
Duration: Oct 1 2025Oct 3 2025

Publication series

NameLecture Notes in Computer Science
Volume16123 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference21st International Workshop on OpenMP, IWOMP 2025
Country/TerritoryUnited States
CityCharlotte
Period10/1/2510/3/25

Funding

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 (LLNL-CONF-2009045). This manuscript has been partially coauthored by Lawrence Livermore National Security, LLC under Contract No. DEAC52-07NA27344 with the US. Department of Energy. The United States Government retains, and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. This work was supported in part by the Advanced Simulation and Computing Program operated by the National Nuclear Security Administration. This research used resources of the Experimental Computing Laboratory (ExCL) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • GPU
  • Offloading
  • PGO
  • Profiling
  • Roofline

Fingerprint

Dive into the research topics of 'Profile Generation for GPU Targets'. Together they form a unique fingerprint.

Cite this