TY - JOUR
T1 - Accelerating Restarted GMRES with Mixed Precision Arithmetic
AU - Lindquist, Neil
AU - Luszczek, Piotr
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2022/4/1
Y1 - 2022/4/1
N2 - The generalized minimum residual method (GMRES) is a commonly used iterative Krylov solver for sparse, non-symmetric systems of linear equations. Like other iterative solvers, data movement dominates its run time. To improve this performance, we propose running GMRES in reduced precision with key operations remaining in full precision. Additionally, we provide theoretical results linking the convergence of finite precision GMRES with classical Gram-Schmidt with reorthogonalization (CGSR) and its infinite precision counterpart which helps justify the convergence of this method to double-precision accuracy. We tested the mixed-precision approach with a variety of matrices and preconditioners on a GPU-accelerated node. Excluding the incomplete LU factorization without fill in (ILU(0)) preconditioner, we achieved average speedups ranging from 8 to 61 percent relative to comparable double-precision implementations, with the simpler preconditioners achieving the higher speedups.
AB - The generalized minimum residual method (GMRES) is a commonly used iterative Krylov solver for sparse, non-symmetric systems of linear equations. Like other iterative solvers, data movement dominates its run time. To improve this performance, we propose running GMRES in reduced precision with key operations remaining in full precision. Additionally, we provide theoretical results linking the convergence of finite precision GMRES with classical Gram-Schmidt with reorthogonalization (CGSR) and its infinite precision counterpart which helps justify the convergence of this method to double-precision accuracy. We tested the mixed-precision approach with a variety of matrices and preconditioners on a GPU-accelerated node. Excluding the incomplete LU factorization without fill in (ILU(0)) preconditioner, we achieved average speedups ranging from 8 to 61 percent relative to comparable double-precision implementations, with the simpler preconditioners achieving the higher speedups.
KW - Linear systems
KW - multiple precision arithmetic
UR - http://www.scopus.com/inward/record.url?scp=85112440294&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2021.3090757
DO - 10.1109/TPDS.2021.3090757
M3 - Article
AN - SCOPUS:85112440294
SN - 1045-9219
VL - 33
SP - 1027
EP - 1037
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 4
ER -