TY - JOUR
T1 - Early experiences evaluating the HPE/Cray ecosystem for AMD GPUs
AU - Melesse Vergara, Verónica G.
AU - Budiardja, Reuben D.
AU - Joubert, Wayne
N1 - Publisher Copyright:
© 2024 John Wiley & Sons Ltd.
PY - 2024/7/10
Y1 - 2024/7/10
N2 - The Oak Ridge Leadership Computing Facility (OLCF) has a long history of supporting and promoting GPU-accelerated computing starting with the deployment of the Titan supercomputer in 2021 and continuing with the Summit supercomputer which has a theoretical peak performance of approximately 200 petaflops. Because the majority of Summit's computational power comes from its 27,972 GPUs, users must port their applications to one of the supported programming models in order to make efficient use of the system. To prepare the transition to Frontier, the OLCF's exascale supercomputer, users will need to adapt to an entirely new ecosystem which will include new hardware and software technologies. First, users will need to familiarize themselves with the AMD Radeon GPU architecture. Furthermore, users who have been previously relying on CUDA will need to transition to the Heterogeneous-Computing Interface for Portability (HIP) or one of the other supported programming models (e.g., OpenMP, OpenACC). In this work, we describe our initial experiences and lessons learned in porting three applications or proxy apps currently running on Summit to the HPE/Cray ecosystem to leverage the compute power from AMD GPUs: minisweep, GenASiS, and Sparkler. Each one is representative of current production workloads utilized at the OLCF, different programming languages, and different programming models.
AB - The Oak Ridge Leadership Computing Facility (OLCF) has a long history of supporting and promoting GPU-accelerated computing starting with the deployment of the Titan supercomputer in 2021 and continuing with the Summit supercomputer which has a theoretical peak performance of approximately 200 petaflops. Because the majority of Summit's computational power comes from its 27,972 GPUs, users must port their applications to one of the supported programming models in order to make efficient use of the system. To prepare the transition to Frontier, the OLCF's exascale supercomputer, users will need to adapt to an entirely new ecosystem which will include new hardware and software technologies. First, users will need to familiarize themselves with the AMD Radeon GPU architecture. Furthermore, users who have been previously relying on CUDA will need to transition to the Heterogeneous-Computing Interface for Portability (HIP) or one of the other supported programming models (e.g., OpenMP, OpenACC). In this work, we describe our initial experiences and lessons learned in porting three applications or proxy apps currently running on Summit to the HPE/Cray ecosystem to leverage the compute power from AMD GPUs: minisweep, GenASiS, and Sparkler. Each one is representative of current production workloads utilized at the OLCF, different programming languages, and different programming models.
KW - CUDA
KW - GPU accelerated computing
KW - HIP
KW - OpenMP
UR - http://www.scopus.com/inward/record.url?scp=85190497444&partnerID=8YFLogxK
U2 - 10.1002/cpe.8113
DO - 10.1002/cpe.8113
M3 - Article
AN - SCOPUS:85190497444
SN - 1532-0626
VL - 36
JO - Concurrency and Computation: Practice and Experience
JF - Concurrency and Computation: Practice and Experience
IS - 15
M1 - e8113
ER -