A massively parallel and scalable multi-CPU material point method

Xinlei Wang, Yuxing Qiu, Stuart R. Slattery, Yu Fang, Minchen Li, Song Chun Zhu, Yixin Zhu, Min Tang, DInesh Manocha, Chenfanfu Jiang

Research output: Contribution to journalArticlepeer-review

50 Scopus citations

Abstract

Harnessing the power of modern multi-GPU architectures, we present a massively parallel simulation system based on the Material Point Method (MPM) for simulating physical behaviors of materials undergoing complex topological changes, self-collision, and large deformations. Our system makes three critical contributions. First, we introduce a new particle data structure that promotes coalesced memory access patterns on the GPU and eliminates the need for complex atomic operations on the memory hierarchy when writing particle data to the grid. Second, we propose a kernel fusion approach using a new Grid-to-Particles-to-Grid (G2P2G) scheme, which efficiently reduces GPU kernel launches, improves latency, and significantly reduces the amount of global memory needed to store particle data. Finally, we introduce optimized algorithmic designs that allow for efficient sparse grids in a shared memory context, enabling us to best utilize modern multi-GPU computational platforms for hybrid Lagrangian-Eulerian computational patterns. We demonstrate the effectiveness of our method with extensive benchmarks, evaluations, and dynamic simulations with elastoplasticity, granular media, and fluid dynamics. In comparisons against an open-source and heavily optimized CPU-based MPM codebase [Fang et al. 2019] on an elastic sphere colliding scene with particle counts ranging from 5 to 40 million, our GPU MPM achieves over 100x per-time-step speedup on a workstation with an Intel 8086K CPU and a single Quadro P6000 GPU, exposing exciting possibilities for future MPM simulations in computer graphics and computational science. Moreover, compared to the state-of-the-art GPU MPM method [Hu et al. 2019a], we not only achieve 2x acceleration on a single GPU but our kernel fusion strategy and Array-of-Structs-of-Array (AoSoA) data structure design also generalizes to multi-GPU systems. Our multi-GPU MPM exhibits near-perfect weak and strong scaling with 4 GPUs, enabling performant and large-scale simulations on a 10243 grid with close to 100 million particles with less than 4 minutes per frame on a single 4-GPU workstation and 134 million particles with less than 1 minute per frame on an 8-GPU workstation.

Original languageEnglish
Article number30
JournalACM Transactions on Graphics
Volume39
Issue number4
DOIs
StatePublished - Jul 8 2020

Funding

We thank Yuanming Hu at MIT for useful discussions and proofreading, Feng Gao at UCLA for his help on configuring workstations, and the anonymous reviewers for their valuable comments. X. W. and M. T. were supported in part by the National Key R&D Program of China (2017YFB1002703), and NSFC (61972341, 61972342, 61732015, 61572423). Penn authors were supported in part by the NSF CAREER (IIS-1943199) and CCF-1813624, DOE ORNL contract 4000171342, a gift from Adobe Inc., NVIDIA GPU grants, and Houdini licenses from SideFX. UCLA authors were supported in part by ONR MURI N00014-16-1-2007, DARPA XAI N66001-17-2-4029, and ONR N00014-19-1-2153. This research was supported by the Exascale Computing Project (17-SC-20-SC). This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • GPU
  • numerical methods
  • parallel computing

Fingerprint

Dive into the research topics of 'A massively parallel and scalable multi-CPU material point method'. Together they form a unique fingerprint.

Cite this