Abstract
This paper presents a portable and performance-efficient approach to solve a batch of linear systems of equations using Graphics Processing Units (GPUs). Each system is represented using a special type of matrices with a band structure above and/or below the diagonal. Each matrix is factorized using an LU factorization with partial pivoting for numerical stability. Subsequently, the factors are used to find the solution for as many right hand sides as needed. The width of the band is often small enough that performing a fully dense LU factorization results in poor performance. We follow the standard LAPACK specifications for addressing this type of problems and develop a dedicated solver that runs efficiently on GPUs. No similar solver is currently available in the vendor's software stack, so performance results are shown on both NVIDIA and AMD GPUs relative to a parallel CPU solution utilizing OpenMP for thread-level parallelization.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
| Publisher | Association for Computing Machinery |
| Pages | 1672-1679 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798400707858 |
| DOIs | |
| State | Published - Nov 12 2023 |
| Event | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States Duration: Nov 12 2023 → Nov 17 2023 |
Publication series
| Name | ACM International Conference Proceeding Series |
|---|
Conference
| Conference | 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 |
|---|---|
| Country/Territory | United States |
| City | Denver |
| Period | 11/12/23 → 11/17/23 |
Funding
This research is supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
Keywords
- Band matrix
- GPU computing
- LU factorization
- batch solvers
- performance portability