Abstract
Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the blockasynchronous Jacobi methodwith directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and timeto-solution.
Original language | English |
---|---|
Title of host publication | Software for Exascale Computing - SPPEXA 2013-2015 |
Editors | Wolfgang E. Nagel, Hans-Joachim Bungartz, Philipp Neumann |
Publisher | Springer Verlag |
Pages | 527-545 |
Number of pages | 19 |
ISBN (Print) | 9783319405261 |
DOIs | |
State | Published - 2016 |
Externally published | Yes |
Event | International Conference on Software for Exascale Computing, SPPEXA 2015 - Munich, Germany Duration: Jan 25 2016 → Jan 27 2016 |
Publication series
Name | Lecture Notes in Computational Science and Engineering |
---|---|
Volume | 113 |
ISSN (Print) | 1439-7358 |
Conference
Conference | International Conference on Software for Exascale Computing, SPPEXA 2015 |
---|---|
Country/Territory | Germany |
City | Munich |
Period | 01/25/16 → 01/27/16 |
Funding
This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC-0012538 and DE-SC-0010042. Daniel B. Szyld was supported in part by the U.S. National Science Foundation under grant DMS-1418882. Support from NVIDIA is also gratefully acknowledged.