Understanding Strong Scaling on GPUs Using Empirical Performance Saturation Size

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The roofline model provides a concise overview of the maximum performance capabilities of a given computer system through a combination of peak memory bandwidth and compute performance rates. The increasing complexity of scheduling and cache in recent GPUs, however, has introduced complicated performance variability that is not captured by arithmetic intensity alone. This work examines the effect of problem size and GPU launch configurations on roofline performance for V100, A100, MI100, and MI250X graphics processing units. We introduce an extended roofline model that takes problem size into account, and find that strong scaling on GPUs can be characterized by saturation problem sizes as additional key metrics. Saturation problem sizes break up a plot of GPU performance vs. problem size into three distinct performance regimes- size-limited, cache-bound, and DRAM-bound. With our extended roofline model, we are able to provide a robust view of these performance regimes across recent GPU architectures.

Original languageEnglish
Title of host publicationProceedings of P3HPC 2022
Subtitle of host publication2022 International Workshop on Performance, Portability and Productivity in HPC, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages26-35
Number of pages10
ISBN (Electronic)9781665460217
DOIs
StatePublished - 2022
Event5th IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC, P3HPC 2022 - Dallas, United States
Duration: Nov 13 2022Nov 18 2022

Publication series

NameProceedings of P3HPC 2022: 2022 International Workshop on Performance, Portability and Productivity in HPC, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference5th IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC, P3HPC 2022
Country/TerritoryUnited States
CityDallas
Period11/13/2211/18/22

Funding

ACKNOWLEDGMENTS This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Officeof Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • CUDA
  • GPU
  • HIP
  • HPC
  • performance analysis
  • roofline model

Fingerprint

Dive into the research topics of 'Understanding Strong Scaling on GPUs Using Empirical Performance Saturation Size'. Together they form a unique fingerprint.

Cite this