TY - GEN
T1 - Autotuning batch cholesky factorization in CUDA with interleaved layout of matrices
AU - Gates, Mark
AU - Kurzak, Jakub
AU - Luszczek, Piotr
AU - Pei, Yu
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/30
Y1 - 2017/6/30
N2 - Batch matrix operations address the case of solving the same linear algebra problem for a very large number of very small matrices. In this paper, we focus on implementing the batch Cholesky factorization in CUDA, in single precision arithmetic, for NVIDIA GPUs. Specifically, we look into the benefits of using noncanonical data layouts, where consecutive memory locations store elements with the same row and column index in a set of consecutive matrices. We discuss a number of different implementation options and tuning parameters. We demonstrate superior performance to traditional implementations for the case of very small matrices.
AB - Batch matrix operations address the case of solving the same linear algebra problem for a very large number of very small matrices. In this paper, we focus on implementing the batch Cholesky factorization in CUDA, in single precision arithmetic, for NVIDIA GPUs. Specifically, we look into the benefits of using noncanonical data layouts, where consecutive memory locations store elements with the same row and column index in a set of consecutive matrices. We discuss a number of different implementation options and tuning parameters. We demonstrate superior performance to traditional implementations for the case of very small matrices.
KW - Cholesky factorization
KW - GPU computing
KW - batch computation
KW - data layout
KW - numerical linear algebra
UR - http://www.scopus.com/inward/record.url?scp=85028088988&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2017.18
DO - 10.1109/IPDPSW.2017.18
M3 - Conference contribution
AN - SCOPUS:85028088988
T3 - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
SP - 1408
EP - 1417
BT - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
Y2 - 29 May 2017 through 2 June 2017
ER -