Abstract
This study proposes a fast low-order finite element solver for crustal deformation computations by applying Tensor Core, AI-specific hardware on a Volta GPU. Tensor Core can compute large matrix-matrix multiplications rapidly in half precision. We redesign a state-of-the-art solver algorithm so that lower-precision data types can be used and memory access costs can be reduced even when we use small matrices. With the proposed solver, we solved 13 billion degrees-of-freedom two-layered problems that mimicked the Earth's crust and mantle using 36 compute nodes of Summit. In the matrix-vector kernel, we obtained a 4.1-fold speedup over a standard kernel in a single-precision format. Our proposed solver increased the FLOP count of the entire solver; however, we reduced the time-to-solution by 1.7-fold since the Tensor Core provided a high effective performance.
Original language | English |
---|---|
Title of host publication | Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2020 |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9781450379939 |
DOIs | |
State | Published - Jun 29 2020 |
Event | 7th Annual Platform for Advanced Scientific Computing Conference, PASC 2020 - Geneva, Switzerland Duration: Jun 29 2020 → Jul 1 2020 |
Publication series
Name | Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2020 |
---|
Conference
Conference | 7th Annual Platform for Advanced Scientific Computing Conference, PASC 2020 |
---|---|
Country/Territory | Switzerland |
City | Geneva |
Period | 06/29/20 → 07/1/20 |
Funding
Our results were obtained using the Summit at Oak Ridge Leadership Computing Facility, a US Department of Energy, Office of Science User Facility at Oak Ridge National Laboratory (ORNL). We thank Yukihiko Hirano (NVIDIA) for coordination of the collaborative research project. We thank Christopher B. Fuson, Don E. Maxwell, Oscar Hernandez, Scott Atchley, Veronica Melesse-Vergara (ORNL), Jeff Larkin, Stephen Abbott (NVIDIA), Lixiang Luo (IBM), Richard Graham (Mellanox Technologies) for generous support concerning use of Summit. We thank Noda Tomoyuki and Hikaru Inoue (Fujitsu Limited) for support in program development. We acknowledge support from Japan Society for the Promotion of Science (18H05239 and 18K18873).
Keywords
- Conjugate gradient method
- Finite element analysis
- GPU computation
- Transprecision computing