Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper introduces several frameworks for the design and implementation of high performance GPU kernels that target batch workloads with irregular sizes. Such workloads are ubiquitous in many scientific applications, including sparse direct solvers, astrophysics, and quantum chemistry. The paper addresses two main categories of frameworks, taking the Cholesky factorization as a case study. The first uses host-side kernel launches, and the second uses device-side launches. Within each category, different design options are introduced, with an emphasis on the advantages and the disadvantages of each approach. Our best performing design outperforms the state-of-the-art CPU implementation, scoring up to 4.7× speedup in double precision on a Pascal P100 GPU.

Original languageEnglish
Title of host publication2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538659892
DOIs
StatePublished - Nov 26 2018
Externally publishedYes
Event2018 IEEE High Performance Extreme Computing Conference, HPEC 2018 - Waltham, United States
Duration: Sep 25 2018Sep 27 2018

Publication series

Name2018 IEEE High Performance Extreme Computing Conference, HPEC 2018

Conference

Conference2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
Country/TerritoryUnited States
CityWaltham
Period09/25/1809/27/18

Funding

This work is partially supported by NSF grant No. OAC-1740250 and CSR 1514286, NVIDIA, and by the Exascale Computing Project (17-SC-20-SC). This work is partially supported by NSF grant No. OAC-1740250 and CSR 1514286, NVIDIA, and by the Exascale Computing Project (17-SC-20-SC)

FundersFunder number
National Science Foundation
Center for Scientific Review1514286
Norsk SykepleierforbundOAC-1740250

    Keywords

    • Batch Linear Algebra
    • GPU Computing
    • Matrix Factorization

    Fingerprint

    Dive into the research topics of 'Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization'. Together they form a unique fingerprint.

    Cite this