GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure

  • Ahmad Abdelfattah
  • , Stanimire Tomov
  • , Piotr Luszczek
  • , Hartwig Anzt
  • , Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper presents a portable and performance-efficient approach to solve a batch of linear systems of equations using Graphics Processing Units (GPUs). Each system is represented using a special type of matrices with a band structure above and/or below the diagonal. Each matrix is factorized using an LU factorization with partial pivoting for numerical stability. Subsequently, the factors are used to find the solution for as many right hand sides as needed. The width of the band is often small enough that performing a fully dense LU factorization results in poor performance. We follow the standard LAPACK specifications for addressing this type of problems and develop a dedicated solver that runs efficiently on GPUs. No similar solver is currently available in the vendor's software stack, so performance results are shown on both NVIDIA and AMD GPUs relative to a parallel CPU solution utilizing OpenMP for thread-level parallelization.

Original languageEnglish
Title of host publicationProceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PublisherAssociation for Computing Machinery
Pages1672-1679
Number of pages8
ISBN (Electronic)9798400707858
DOIs
StatePublished - Nov 12 2023
Event2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States
Duration: Nov 12 2023Nov 17 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Country/TerritoryUnited States
CityDenver
Period11/12/2311/17/23

Funding

This research is supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

Keywords

  • Band matrix
  • GPU computing
  • LU factorization
  • batch solvers
  • performance portability

Fingerprint

Dive into the research topics of 'GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure'. Together they form a unique fingerprint.

Cite this