Abstract
We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.
Original language | English |
---|---|
Title of host publication | Proceedings - 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 132-139 |
Number of pages | 8 |
ISBN (Electronic) | 9781538677698 |
DOIs | |
State | Published - Jul 2 2018 |
Event | 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018 - Lyon, France Duration: Sep 24 2018 → Sep 27 2018 |
Publication series
Name | Proceedings - 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018 |
---|
Conference
Conference | 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018 |
---|---|
Country/Territory | France |
City | Lyon |
Period | 09/24/18 → 09/27/18 |
Funding
ACKNOWLEDGMENTS This work was partly funded by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Number DE-SC-0010042. H. Anzt was supported by the “Impuls und Vernetzungsfond” of the Helmholtz Association under grant VH-NG-1241. G. Flegar was supported by projects TIN2014-53495-R of the Spanish Ministerio de Economía y Competitividad and the EU H2020 project 732631 OPRECOMP.