Performance tuning and optimization techniques of fixed and variable size batched cholesky factorization on GPUs

Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalConference articlepeer-review

12 Scopus citations

Abstract

Solving a large number of relatively small linear systems has recently drawn more attention in the HPC community, due to the importance of such computational workloads in many scienti c applications, including sparse multifrontal solvers. Modern hardware accelerators and their architecture require a set of optimization techniques that are very di erent from the ones used in solving one relatively large matrix. In order to impose concurrency on such throughput-oriented architectures, a common practice is to batch the solution of these matrices as one task o oaded to the underlying hardware, rather than solving them individually. This paper presents a high performance batched Cholesky factorization on large sets of relatively small matrices using Graphics Processing Units (GPUs), and addresses both xed and variable size batched problems. We investigate various algorithm designs and optimization techniques, and show that it is essential to combine kernel design with performance tuning in order to achieve the best possible performance. We compare our approaches against state-of-the-art CPU solutions as well as GPU-based solutions using existing libraries, and show that, on a K40c GPU for example, our kernels are more than 2 faster.

Original languageEnglish
Pages (from-to)119-130
Number of pages12
JournalProcedia Computer Science
Volume80
DOIs
StatePublished - 2016
EventInternational Conference on Computational Science, ICCS 2016 - San Diego, United States
Duration: Jun 6 2016Jun 8 2016

Funding

This material is based on work supported by NSF under Grants No. CSR 1514286 and ACI-1339822, NVIDIA, and in part by the Russian Scientific Foundation, Agreement N14-11-00190.

FundersFunder number
National Science FoundationCSR 1514286, ACI-1339822
NVIDIA
Russian Science FoundationN14-11-00190

    Keywords

    • Batched computation
    • Cholesky factorization
    • GPUs
    • Tuning

    Fingerprint

    Dive into the research topics of 'Performance tuning and optimization techniques of fixed and variable size batched cholesky factorization on GPUs'. Together they form a unique fingerprint.

    Cite this