Abstract
The molecular dynamics simulation software, LAMMPS, utilizes the Kokkos acceleration library to port computation to a diverse set of architectures including those based on GPU accelerators. In addition to Kokkos, LAMMPS contains a vast code base that leverages the CUDA application programming interface using library functions such as cuFFT, CUDA's fast-fourier transform (FFT) library, and, more recently, also support for AMD's Heterogeneous Interface for Portability (HIP) that is rapidly growing. While preparing LAMMPS tests for the AMD GPU-based test system precursors to Frontier, we investigated several strategies for accelerating LAMMPS on AMD GPUs, using the AMD Instinct MI100 and MI250X. In this work, we integrated the HIP FFT library, hipFFT, into the particle-particle particle-mesh (PPPM) long-range solver, which allowed the porting of PPPM calculations to the GPUs. Kokkos behavior on the MI100 and MI250X was also investigated through the package kokkos command of LAMMPS, targeting communication, memory usage, and particle grid decomposition. The Tersoff, Reax, Lennard-Jones (LJ), EAM, Granular, and PPPM potentials were investigated in this effort, and results from these experiments are provided. The selected potentials were run on Spock (AMD Instinct MI100), Crusher (AMD Instinct MI250X), AFW HPC11 (NVIDIA A100) and Summit (NVIDIA V100), for comparison. Operational roofline models were constructed and analyzed for the Tersoff, Reax, and Lennard–Jones potentials on Crusher and Summit.
Original language | English |
---|---|
Article number | e7895 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 35 |
Issue number | 28 |
DOIs | |
State | Published - Dec 25 2023 |
Funding
This manuscript has been authored in part by UT‐Battelle, LLC, under contract DE‐AC05‐00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid‐up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe‐public‐access‐plan ). This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Funders | Funder number |
---|---|
DOE Public Access Plan | |
U.S. Department of Energy |
Keywords
- GPU accelerated computing
- molecular dynamics
- performance portability
- roofline model