Autotuning batch cholesky factorization in CUDA with interleaved layout of matrices

Mark Gates, Jakub Kurzak, Piotr Luszczek, Yu Pei, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Batch matrix operations address the case of solving the same linear algebra problem for a very large number of very small matrices. In this paper, we focus on implementing the batch Cholesky factorization in CUDA, in single precision arithmetic, for NVIDIA GPUs. Specifically, we look into the benefits of using noncanonical data layouts, where consecutive memory locations store elements with the same row and column index in a set of consecutive matrices. We discuss a number of different implementation options and tuning parameters. We demonstrate superior performance to traditional implementations for the case of very small matrices.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1408-1417
Number of pages10
ISBN (Electronic)9781538634080
DOIs
StatePublished - Jun 30 2017
Externally publishedYes
Event31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017 - Orlando, United States
Duration: May 29 2017Jun 2 2017

Publication series

NameProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Conference

Conference31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
Country/TerritoryUnited States
CityOrlando
Period05/29/1706/2/17

Funding

Autotuning of Computational Kernels” from the National Science Foundation.

FundersFunder number
National Science Foundation

    Keywords

    • Cholesky factorization
    • GPU computing
    • batch computation
    • data layout
    • numerical linear algebra

    Fingerprint

    Dive into the research topics of 'Autotuning batch cholesky factorization in CUDA with interleaved layout of matrices'. Together they form a unique fingerprint.

    Cite this