Abstract
This work presents two implementations of linear solvers for distributed-memory machines with GPU accelerators—one based on the Cholesky factorization and one based on the LU factorization with partial pivoting. The routines are developed as part of the Software for Linear Algebra Targeting Exascale (SLATE) package, which represents a sharp departure from the traditional conventions established by legacy packages, such as LAPACK and ScaLAPACK. The article lays out the principles of the new approach, discusses the implementation details, and presents the performance results.
Original language | English |
---|---|
Title of host publication | Euro-Par 2019 |
Subtitle of host publication | Parallel Processing - 25th International Conference on Parallel and Distributed Computing, Proceedings |
Editors | Ramin Yahyapour |
Publisher | Springer |
Pages | 495-506 |
Number of pages | 12 |
ISBN (Print) | 9783030293994 |
DOIs | |
State | Published - 2019 |
Event | 25th International European Conference on Parallel and Distributed Computing, Euro-Par 2019 - Göttingen, Germany Duration: Aug 26 2019 → Aug 30 2019 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11725 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 25th International European Conference on Parallel and Distributed Computing, Euro-Par 2019 |
---|---|
Country/Territory | Germany |
City | Göttingen |
Period | 08/26/19 → 08/30/19 |
Funding
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration). Acknowledgments. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, in support of the nation’s exascale computing imperative.
Keywords
- Cholesky factorization
- Distributed memory
- GPU acceleration
- LU factorization
- Linear algebra
- Linear systems of equations