Abstract
Incomplete Sparse Approximate Inverses (ISAI) have recently been shown to be an attractive alternative to exact sparse triangular solves in the context of incomplete factorization preconditioning. In this paper we propose a batched GPU-kernel for the efficient generation of ISAI matrices. Utilizing only thread-local memory allows for computing the ISAI matrix with very small memory footprint. We demonstrate that this strategy is faster than the existing strategy for generating ISAI matrices, and use a large number of test matrices to assess the algorithm's efficiency in an iterative solver setting.
Original language | English |
---|---|
Title of host publication | Proceedings of ScalA 2016 |
Subtitle of host publication | 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 49-56 |
Number of pages | 8 |
ISBN (Electronic) | 9781509052226 |
DOIs | |
State | Published - Jan 30 2017 |
Externally published | Yes |
Event | 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2016 - Salt Lake City, United States Duration: Nov 13 2016 → Nov 18 2016 |
Publication series
Name | Proceedings of ScalA 2016: 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC16: The International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2016 |
---|---|
Country/Territory | United States |
City | Salt Lake City |
Period | 11/13/16 → 11/18/16 |
Funding
This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC0016564 and DE-SC0016513, and NVIDIA.
Keywords
- Batched routines
- GPU
- Incomplete Sparse Approximate Inverses
- Preconditioning