Harnessing GPU Tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers

Azzam Haidar, Stanimire Tomov, Jack Dongarra, Nicholas J. Higham

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

155 Scopus citations

Abstract

Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16-FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4× speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.

Original languageEnglish
Title of host publicationProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages603-613
Number of pages11
ISBN (Electronic)9781538683842
DOIs
StatePublished - Jul 2 2018
Event2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States
Duration: Nov 11 2018Nov 16 2018

Publication series

NameProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
Country/TerritoryUnited States
CityDallas
Period11/11/1811/16/18

Funding

ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The work was also partially supported by Nvidia and NSF grant No. OAC-1740250. N. J. Higham was supported by Engineering and Physical Sciences Research Council grant EP/P020720/1, The MathWorks, and the Royal Society.

FundersFunder number
National Science FoundationOAC-1740250
U.S. Department of Energy
National Nuclear Security Administration
Engineering and Physical Sciences Research CouncilEP/P020720/1
Royal Society

    Keywords

    • FP16 Arithmetic
    • GPU Computing
    • Half Precision
    • Iterative Refinement Computation
    • Linear Algebra
    • Mixed Precision Solvers

    Fingerprint

    Dive into the research topics of 'Harnessing GPU Tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers'. Together they form a unique fingerprint.

    Cite this