Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs

Research output: Contribution to journalArticlepeer-review

Abstract

The Oak Ridge Leadership Computing Facility (OLCF) has a long history of supporting and promoting GPU-accelerated computing starting with the deployment of the Titan supercomputer in 2021 and continuing with the Summit supercomputer which has a theoretical peak performance of approximately 200 petaflops. Because the majority of Summit's computational power comes from its 27,972 GPUs, users must port their applications to one of the supported programming models in order to make efficient use of the system. To prepare the transition to Frontier, the OLCF's exascale supercomputer, users will need to adapt to an entirely new ecosystem which will include new hardware and software technologies. First, users will need to familiarize themselves with the AMD Radeon GPU architecture. Furthermore, users who have been previously relying on CUDA will need to transition to the Heterogeneous-Computing Interface for Portability (HIP) or one of the other supported programming models (e.g., OpenMP, OpenACC). In this work, we describe our initial experiences and lessons learned in porting three applications or proxy apps currently running on Summit to the HPE/Cray ecosystem to leverage the compute power from AMD GPUs: minisweep, GenASiS, and Sparkler. Each one is representative of current production workloads utilized at the OLCF, different programming languages, and different programming models.

Original languageEnglish
Article numbere8113
JournalConcurrency and Computation: Practice and Experience
Volume36
Issue number15
DOIs
StatePublished - Jul 10 2024

Keywords

  • CUDA
  • GPU accelerated computing
  • HIP
  • OpenMP

Fingerprint

Dive into the research topics of 'Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs'. Together they form a unique fingerprint.

Cite this