TY - GEN
T1 - Variable-Size Batched Condition Number Calculation on GPUs
AU - Anzt, Hartwig
AU - Dongarra, Jack
AU - Flegar, Goran
AU - Grutzmacher, Thomas
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.
AB - We present a kernel that is designed to quickly compute the condition number of a large collection of tiny matrices on a graphics processing unit (GPU). The matrices can differ in size and the process integrates the use of pivoting to ensure a numerically-stable matrix inversion. The performance assessment reveals that, in double precision arithmetic, the new GPU kernel achieves up to 550 GFLOPs (billions of floating-point operations per second) and 800 GFLOPs on NVIDIA's P100 and V100 GPUs, respectively. The results also demonstrate a considerable speed-up with respect to a workflow that computes the condition number via launching a set of four batched kernels. In addition, we present a variable-size batched kernel for the computation of the matrix infinity norm. We show that this memory-bound kernel achieves up to 90% of the sustainable peak bandwidth.
UR - http://www.scopus.com/inward/record.url?scp=85063127142&partnerID=8YFLogxK
U2 - 10.1109/CAHPC.2018.8645907
DO - 10.1109/CAHPC.2018.8645907
M3 - Conference contribution
AN - SCOPUS:85063127142
T3 - Proceedings - 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018
SP - 132
EP - 139
BT - Proceedings - 2018 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2018
Y2 - 24 September 2018 through 27 September 2018
ER -