TY - GEN
T1 - Least squares solvers for distributed-memory machines with GPU accelerators
AU - Kurzak, Jakub
AU - Gates, Mark
AU - Charara, Ali
AU - Yarkhan, Asim
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/6/26
Y1 - 2019/6/26
N2 - This work presents an implementation of a linear least squares solver for distributed-memory machines with GPU accelerators, developed as part of the Software for Linear Algebra Targeting Exascale (SLATE) package. From the algorithmic standpoint, the work leverages recent advances in dense linear algebra, specifically the communication-avoiding QR factorization. From the implementation standpoint, the work represents a sharp departure from the traditional conventions established by legacy packages, such as LAPACK and ScaLAPACK. It is based on representing the matrix as a collection of individual tiles, and using batch operations for offloading work to accelerators. The article lays out the principles of the new approach, discusses the implementation details and presents the performance results.
AB - This work presents an implementation of a linear least squares solver for distributed-memory machines with GPU accelerators, developed as part of the Software for Linear Algebra Targeting Exascale (SLATE) package. From the algorithmic standpoint, the work leverages recent advances in dense linear algebra, specifically the communication-avoiding QR factorization. From the implementation standpoint, the work represents a sharp departure from the traditional conventions established by legacy packages, such as LAPACK and ScaLAPACK. It is based on representing the matrix as a collection of individual tiles, and using batch operations for offloading work to accelerators. The article lays out the principles of the new approach, discusses the implementation details and presents the performance results.
KW - Distributed memory
KW - Least squares
KW - Linear algebra
UR - https://www.scopus.com/pages/publications/85074525511
U2 - 10.1145/3330345.3330356
DO - 10.1145/3330345.3330356
M3 - Conference contribution
AN - SCOPUS:85074525511
T3 - Proceedings of the International Conference on Supercomputing
SP - 117
EP - 126
BT - ICS 2019 - International Conference on Supercomputing
PB - Association for Computing Machinery
T2 - 33rd ACM International Conference on Supercomputing, ICS 2019, held in conjunction with the Federated Computing Research Conference, FCRC 2019
Y2 - 26 June 2019
ER -