TY - GEN
T1 - A block-asynchronous relaxation method for graphics processing units
AU - Anzt, Hartwig
AU - Tomov, Stanimire
AU - Dongarra, Jack
AU - Heuveline, Vincent
PY - 2012
Y1 - 2012
N2 - In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the results, we observe that even for our most basic asynchronous relaxation scheme, despite its lower convergence rate compared to the Gauss-Seidel relaxation (that we expected), the asynchronous iteration running on GPUs is still able to provide solution approximations of certain accuracy in considerably shorter time than Gauss-Seidel running on CPUs. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes - using multiple iterations within the "subdomain" handled by a GPU thread block and Jacobi-like asynchronous updates across the "boundaries", subject to tuning various parameters - we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times (compared to the standard but difficult to parallelize Gauss-Seidel type of schemes), while keeping the execution time of a global iteration practically the same. This shows the high potential of the asynchronous methods not only as a stand alone numerical solver for linear systems of equations fulfilling certain convergence conditions but more importantly as a smoother in multigrid methods. Due to the explosion of parallelism in today's architecture designs, the significance and the need for asynchronous methods, as the ones described in this work, is expected to grow.
AB - In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the results, we observe that even for our most basic asynchronous relaxation scheme, despite its lower convergence rate compared to the Gauss-Seidel relaxation (that we expected), the asynchronous iteration running on GPUs is still able to provide solution approximations of certain accuracy in considerably shorter time than Gauss-Seidel running on CPUs. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes - using multiple iterations within the "subdomain" handled by a GPU thread block and Jacobi-like asynchronous updates across the "boundaries", subject to tuning various parameters - we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times (compared to the standard but difficult to parallelize Gauss-Seidel type of schemes), while keeping the execution time of a global iteration practically the same. This shows the high potential of the asynchronous methods not only as a stand alone numerical solver for linear systems of equations fulfilling certain convergence conditions but more importantly as a smoother in multigrid methods. Due to the explosion of parallelism in today's architecture designs, the significance and the need for asynchronous methods, as the ones described in this work, is expected to grow.
KW - Asynchronous Relaxation
KW - Chaotic Iteration
KW - Graphics Processing Units (GPUs)
KW - Jacobi Method
UR - http://www.scopus.com/inward/record.url?scp=84867426991&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2012.11
DO - 10.1109/IPDPSW.2012.11
M3 - Conference contribution
AN - SCOPUS:84867426991
SN - 9780769546766
T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
SP - 113
EP - 124
BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Y2 - 21 May 2012 through 25 May 2012
ER -