TY - GEN
T1 - Progressive Optimization of Batched LU Factorization on GPUs
AU - Abdelfattah, Ahmad
AU - Tomov, Stanimire
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - This paper presents a progressive approach for optimizing the batched LU factorization on graphics processing units (GPUs). The paper shows that the reliance on level-3 BLAS routines for performance does not really pay off, and that it is indeed important to pay attention to the memory-bound part of the algorithm, especially when the problem size is very small. In this context, we develop a size-aware multi-level blocking technique that utilizes different granularities for kernel fusion according to the problem size. Our experiments, which are conducted on a Tesla V100 GPU, show that the multi-level blocking technique achieves speedups for single/double precisions that are up to 3.28×/2.69× against the generic LAPACK-style implementation. It is also up to 8.72×/7.2× faster than the cuBLAS library for single and double precisions, respectively. The developed solution is integrated into the open-source MAGMA library.
AB - This paper presents a progressive approach for optimizing the batched LU factorization on graphics processing units (GPUs). The paper shows that the reliance on level-3 BLAS routines for performance does not really pay off, and that it is indeed important to pay attention to the memory-bound part of the algorithm, especially when the problem size is very small. In this context, we develop a size-aware multi-level blocking technique that utilizes different granularities for kernel fusion according to the problem size. Our experiments, which are conducted on a Tesla V100 GPU, show that the multi-level blocking technique achieves speedups for single/double precisions that are up to 3.28×/2.69× against the generic LAPACK-style implementation. It is also up to 8.72×/7.2× faster than the cuBLAS library for single and double precisions, respectively. The developed solution is integrated into the open-source MAGMA library.
KW - Batch computation
KW - GPU computing
KW - LU factorization
UR - http://www.scopus.com/inward/record.url?scp=85076680461&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2019.8916270
DO - 10.1109/HPEC.2019.8916270
M3 - Conference contribution
AN - SCOPUS:85076680461
T3 - 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019
BT - 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019
Y2 - 24 September 2019 through 26 September 2019
ER -