GPU-enabled extreme-scale turbulence simulations: Fourier pseudo-spectral algorithms at the exascale using OpenMP offloading

P. K. Yeung, Kiran Ravikumar, Stephen Nichols, Rohini Uma-Vaideswaran

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Fourier pseudo-spectral methods for nonlinear partial differential equations are of wide interest in many areas of advanced computational science, including direct numerical simulation of three-dimensional (3-D) turbulence governed by the Navier-Stokes equations in fluid dynamics. This paper presents a new capability for simulating turbulence at a new record resolution up to 35 trillion grid points, on the world's first exascale computer, Frontier, comprising AMD MI250x GPUs with HPE's Slingshot interconnect and operated by the US Department of Energy's Oak Ridge Leadership Computing Facility (OLCF). Key programming strategies designed to take maximum advantage of the machine architecture involve performing almost all computations on the GPU which has the same memory capacity as the CPU, performing all-to-all communication among sets of parallel processes directly on the GPU, and targeting GPUs efficiently using OpenMP offloading for intensive number-crunching including 1-D Fast Fourier Transforms (FFT) performed using AMD ROCm library calls. With 99% of computing power on Frontier being on the GPU, leaving the CPU idle leads to a net performance gain via avoiding the overhead of data movement between host and device except when needed for some I/O purposes. Memory footprint including the size of communication buffers for MPI_ALLTOALL is managed carefully to maximize the largest problem size possible for a given node count. Detailed performance data including separate contributions from different categories of operations to the elapsed wall time per step are reported for five grid resolutions, from 20483 on a single node to 327683 on 4096 or 8192 nodes out of 9408 on the system. Both 1D and 2D domain decompositions which divide a 3D periodic domain into slabs and pencils respectively are implemented. The present code suite (labeled by the acronym GESTS, GPUs for Extreme Scale Turbulence Simulations) achieves a figure of merit (in grid points per second) exceeding goals set in the Center for Accelerated Application Readiness (CAAR) program for Frontier. The performance attained is highly favorable in both weak scaling and strong scaling, with notable departures only for 20483 where communication is entirely intra-node, and for 327683, where a challenge due to small message sizes does arise. Communication performance is addressed further using a lightweight test code that performs all-to-all communication in a manner matching the full turbulence simulation code. Performance at large problem sizes is affected by both small message size due to high node counts as well as dragonfly network topology features on the machine, but is consistent with official expectations of sustained performance on Frontier. Overall, although not perfect, the scalability achieved at the extreme problem size of 327683 (and up to 8192 nodes — which corresponds to hardware rated at just under 1 exaflop/sec of theoretical peak computational performance) is arguably better than the scalability observed using prior state-of-the-art algorithms on Frontier's predecessor machine (Summit) at OLCF. New science results for the study of intermittency in turbulence enabled by this code and its extensions are to be reported separately in the near future.

Original languageEnglish
Article number109364
JournalComputer Physics Communications
Volume306
DOIs
StatePublished - Jan 2025

Funding

The work at Georgia Tech is sustained by a subaward from NSF via CSSI Grant 2103874 led by C. Meneveau of The Johns Hopkins University. For strong encouragement and valuable technical advice we are indebted to many current or former members of OLCF, HPE and AMD staff, including (in alphabetical order) Steve Abbott, Alessandro Fanfarillo, Oscar Hernandez, John Levesque, Nick Malaya, Bronson Messer, Mark Stock, Matthew Turner, and Jack Wells. PKY also acknowledges the impetus for work on turbulence from enduring science collaborations with Toshiyuki Gotoh, Charles Meneveau, Stephen B. Pope, and Katepalli R. Sreenivasan. Finally, we thank two anonymous reviewers for their constructive comments. This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher acknowledges the US government license to provide public access under the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). The authors gratefully acknowledge support from the OLCF CAAR program from 2020 to 2023 and the DOE INCITE program during 2023. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

Keywords

  • 3D fast Fourier transform
  • Direct numerical simulations
  • Exascale
  • GPU-aware MPI
  • OpenMP offloading
  • Turbulence

Fingerprint

Dive into the research topics of 'GPU-enabled extreme-scale turbulence simulations: Fourier pseudo-spectral algorithms at the exascale using OpenMP offloading'. Together they form a unique fingerprint.

Cite this