Abstract
We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, 2013). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al.; 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5×.
Original language | English |
---|---|
Pages (from-to) | 97-107 |
Number of pages | 11 |
Journal | Computer Physics Communications |
Volume | 192 |
DOIs | |
State | Published - Jul 1 2015 |
Externally published | Yes |
Funding
This material is based upon work supported by the DOD / ASD (R&E) under Award No. N00244-09-1-0062 (JG, JAA, JAM, SCG). JG acknowledges support by DFG grant GL733/1-1 . We also acknowledge support by the National Science Foundation , Division of Materials Research , award DMR 1409620 (JAA and SCG), and award DMR 0907338 (JG and DCM). This work was partially supported by a Simons Investigator award from the Simons Foundation to Sharon Glotzer (SCG, JG). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725 . This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (award number ACI 1238993 ) and the state of Illinois . Blue Waters is a joint effort of the University of Illinois at Urbana–Champaign and its National Center for Supercomputing Applications. We thank the University of Cambridge for providing access to their Wilkes cluster. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the DOD/ASD(R&E). The Glotzer Group at the University of Michigan is a CUDA Research Center. Hardware support by NVIDIA is gratefully acknowledged.
Funders | Funder number |
---|---|
National Science Foundation | 1238993, 1409620, 0907338, 1515306 |
U.S. Department of Defense | |
U.S. Department of Energy | DE-AC05-00OR22725, ACI 1238993 |
Division of Materials Research | DMR 0907338, DMR 1409620 |
Simons Foundation | |
Office of Science | |
NVIDIA | |
Astrophysics Science Division | N00244-09-1-0062 |
Deutsche Forschungsgemeinschaft | GL733/1-1 |
Keywords
- Domain decomposition
- LAMMPS
- MPI/CUDA
- Molecular dynamics
- Multi-GPU
- Strong scaling
- Weak scaling