Abstract
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16-FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4× speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.
Original language | English |
---|---|
Title of host publication | Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 603-613 |
Number of pages | 11 |
ISBN (Electronic) | 9781538683842 |
DOIs | |
State | Published - Jul 2 2018 |
Event | 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States Duration: Nov 11 2018 → Nov 16 2018 |
Publication series
Name | Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 |
---|
Conference
Conference | 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 |
---|---|
Country/Territory | United States |
City | Dallas |
Period | 11/11/18 → 11/16/18 |
Funding
ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The work was also partially supported by Nvidia and NSF grant No. OAC-1740250. N. J. Higham was supported by Engineering and Physical Sciences Research Council grant EP/P020720/1, The MathWorks, and the Royal Society.
Keywords
- FP16 Arithmetic
- GPU Computing
- Half Precision
- Iterative Refinement Computation
- Linear Algebra
- Mixed Precision Solvers