Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems: Linear systems, mixed precision, HPC

Azzam Haidar, Harun Bayraktar, Stanimire Tomov, Jack Dongarra, Nicholas J. Higham

Research output: Contribution to journalArticlepeer-review

35 Scopus citations

Abstract

Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax = b without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4×-5× performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.

Original languageEnglish
Article number0110
JournalProceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Volume476
Issue number2243
DOIs
StatePublished - 2020

Funding

The work of A.H. and H.B. was supported by NVIDIA. The work of S.T. and J.D. was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy's Office of Science and National Nuclear Security Administration. The work of N.J.H. was supported by Engineering and Physical Sciences Research Council grant EP/P020720/1 and the Royal Society. Dataaccessibility. Thesoftwarecanbefoundathttps://developer.nvidia.com/cuda-downloads.Thisarticlehas no additional data. Authors’ contributions. All authors drafted and revised the manuscript. All authors read and approved the manuscript for publication and agree to be held accountable for the work performed therein. Competing interests. We declare we have no competing interests. Funding. The work of A.H. and H.B. was supported by NVIDIA. The work of S.T. and J.D. was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration. The work of N.J.H. was supported by Engineering and Physical Sciences Research Council grant EP/P020720/1 and the Royal Society. Acknowledgements. We thank the anonymous reviewers for their insightful comments and suggestions that greatly improved the manuscript.

Keywords

  • GMRES
  • GPU computing
  • LU factorization
  • half precision arithmetic
  • iterative refinement
  • mixed precision solvers

Fingerprint

Dive into the research topics of 'Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems: Linear systems, mixed precision, HPC'. Together they form a unique fingerprint.

Cite this