TY - GEN
T1 - Optimizing GPU Kernels for Irregular Batch Workloads
T2 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
AU - Abdelfattah, Ahmad
AU - Haidar, Azzam
AU - Tomov, Stanimire
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - This paper introduces several frameworks for the design and implementation of high performance GPU kernels that target batch workloads with irregular sizes. Such workloads are ubiquitous in many scientific applications, including sparse direct solvers, astrophysics, and quantum chemistry. The paper addresses two main categories of frameworks, taking the Cholesky factorization as a case study. The first uses host-side kernel launches, and the second uses device-side launches. Within each category, different design options are introduced, with an emphasis on the advantages and the disadvantages of each approach. Our best performing design outperforms the state-of-the-art CPU implementation, scoring up to 4.7× speedup in double precision on a Pascal P100 GPU.
AB - This paper introduces several frameworks for the design and implementation of high performance GPU kernels that target batch workloads with irregular sizes. Such workloads are ubiquitous in many scientific applications, including sparse direct solvers, astrophysics, and quantum chemistry. The paper addresses two main categories of frameworks, taking the Cholesky factorization as a case study. The first uses host-side kernel launches, and the second uses device-side launches. Within each category, different design options are introduced, with an emphasis on the advantages and the disadvantages of each approach. Our best performing design outperforms the state-of-the-art CPU implementation, scoring up to 4.7× speedup in double precision on a Pascal P100 GPU.
KW - Batch Linear Algebra
KW - GPU Computing
KW - Matrix Factorization
UR - http://www.scopus.com/inward/record.url?scp=85060088912&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2018.8547576
DO - 10.1109/HPEC.2018.8547576
M3 - Conference contribution
AN - SCOPUS:85060088912
T3 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
BT - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 September 2018 through 27 September 2018
ER -