Characterizing the Impact of GPU Power Management on an Exascale System

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As GPU-accelerated high-performance computing (HPC) systems approach exascale performance, controlling energy consumption without compromising throughput is essential. Architectures such as the AMD MI250X-based Frontier supercomputer provide runtime mechanisms like frequency and power capping, enabling energy tuning without modifying application code. Although both target energy reduction, they operate via distinct hardware control paths and influence workloads differently. We present a comprehensive evaluation of these strategies on a leadership-class system using diverse HPC proxy applications representative of production workloads. Our study analyzes performance-energy trade-offs across multiple capping levels, node counts (1 and 32), and application profiles. Results show that frequency capping generally achieves higher energy efficiency and scalability, with gains of up to 13.2% without performance loss, while power capping is more effective for single-node runs or bursty GPU utilization. We also provide practical guidelines to help system administrators and users balance energy efficiency and performance in large-scale scientific workloads.

Original languageEnglish
Title of host publicationProceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
PublisherAssociation for Computing Machinery, Inc
Pages1524-1533
Number of pages10
ISBN (Electronic)9798400718717
DOIs
StatePublished - Nov 15 2025
Event2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops - St. Louis, United States
Duration: Nov 16 2025Nov 21 2025

Publication series

NameProceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops

Conference

Conference2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
Country/TerritoryUnited States
CitySt. Louis
Period11/16/2511/21/25

Funding

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy (DOE) under Contract No. DE-AC05-00OR22725. This study was financed in part by the CAPES - Finance Code 001, FAPERGS - PqG 24/2551-0001388-1, and CNPq. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the DOE. The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • Energy efficiency
  • Exascale Systems
  • Frequency Capping
  • GPU
  • Power Capping

Fingerprint

Dive into the research topics of 'Characterizing the Impact of GPU Power Management on an Exascale System'. Together they form a unique fingerprint.

Cite this