TY - GEN
T1 - Domain overlap for iterative sparse triangular solves on GPUs
AU - Anzt, Hartwig
AU - Chow, Edmond
AU - Szyld, Daniel B.
AU - Dongarra, Jack
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the blockasynchronous Jacobi methodwith directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and timeto-solution.
AB - Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the blockasynchronous Jacobi methodwith directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and timeto-solution.
UR - http://www.scopus.com/inward/record.url?scp=84989941466&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-40528-5_24
DO - 10.1007/978-3-319-40528-5_24
M3 - Conference contribution
AN - SCOPUS:84989941466
SN - 9783319405261
T3 - Lecture Notes in Computational Science and Engineering
SP - 527
EP - 545
BT - Software for Exascale Computing - SPPEXA 2013-2015
A2 - Nagel, Wolfgang E.
A2 - Bungartz, Hans-Joachim
A2 - Neumann, Philipp
PB - Springer Verlag
T2 - International Conference on Software for Exascale Computing, SPPEXA 2015
Y2 - 25 January 2016 through 27 January 2016
ER -