Abstract
Using NVIDIA graphics processing units (GPUs) equipped with Tensor Cores has enabled the significant acceleration of general matrix multiplication (GEMM) for applications in machine learning (ML) and artificial intelligence (AI) and in high-performance computing (HPC) generally. The use of such power-efficient, specialized accelerators can provide a performance increase between 8 × and 20 ×, albeit with a loss in precision. However, a high level of precision is required in many large scientific and HPC applications, and computing in single or double precision is still necessary for many of these applications to maintain accuracy. Fortunately, mixed-precision methods can be employed to maintain a higher level of numerical precision while also taking advantage of the performance increases from computing with lower-precision AI cores. With this in mind, we extend the state of the art by using NVIDIA's new TF32 framework. This new framework not only burdens some constraints of the previous frameworks, such as costly 32 16-bit castings but also provides an equivalent precision and performance by using a much simpler approach. We also propose a new framework called TF64 that attempts double-precision arithmetic with low-precision Tensor Cores. Although this framework does not exist yet, we validated the correctness of this idea and achieved an equivalent of 64-bit precision on 32-bit hardware.
Original language | English |
---|---|
Title of host publication | Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
Publisher | Association for Computing Machinery |
Pages | 179-186 |
Number of pages | 8 |
ISBN (Electronic) | 9798400707858 |
DOIs | |
State | Published - Nov 12 2023 |
Event | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States Duration: Nov 12 2023 → Nov 17 2023 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 11/12/23 → 11/17/23 |
Funding
Science under Contract No. DE-AC05-00OR22725. This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the DOE’s Office of Science and the National Nuclear Security Administration. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the DOE. The publisher, by accepting the article for publication, acknowledges that the US Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript or allow others to do so, for US Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility and the Experimental Computing Laboratory at the Oak Ridge National Laboratory, which is supported by DOE’s Office of
Keywords
- GEMM
- GPUs
- Mixed Precision
- Tensor Core