Abstract
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16-FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4× speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 603-613 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781538683842 |
| DOIs | |
| State | Published - Jul 2 2018 |
| Event | 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States Duration: Nov 11 2018 → Nov 16 2018 |
Publication series
| Name | Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 |
|---|
Conference
| Conference | 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 |
|---|---|
| Country/Territory | United States |
| City | Dallas |
| Period | 11/11/18 → 11/16/18 |
Funding
ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The work was also partially supported by Nvidia and NSF grant No. OAC-1740250. N. J. Higham was supported by Engineering and Physical Sciences Research Council grant EP/P020720/1, The MathWorks, and the Royal Society.
Keywords
- FP16 Arithmetic
- GPU Computing
- Half Precision
- Iterative Refinement Computation
- Linear Algebra
- Mixed Precision Solvers
Fingerprint
Dive into the research topics of 'Harnessing GPU Tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver