TY - JOUR
T1 - Symmetric indefinite linear solver using OpenMP task on multicore architectures
AU - Yamazaki, Ichitaro
AU - Kurzak, Jakub
AU - Wu, Panruo
AU - Zounon, Mawussi
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2018/8/1
Y1 - 2018/8/1
N2 - Recently, the Open Multi-Processing (OpenMP) standard has incorporated task-based programming, where a function call with input and output data is treated as a task. At run time, OpenMP's superscalar scheduler tracks the data dependencies among the tasks and executes the tasks as their dependencies are resolved. On a shared-memory architecture with multiple cores, the independent tasks are executed on different cores in parallel, thereby enabling parallel execution of a seemingly sequential code. With the emergence of many-core architectures, this type of programming paradigm is gaining attention - not only because of its simplicity, but also because it breaks the artificial synchronization points of the program and improves its thread-level parallelization. In this paper, we use these new OpenMP features to develop a portable high-performance implementation of a dense symmetric indefinite linear solver. Obtaining high performance from this kind of solver is a challenge because the symmetric pivoting, which is required to maintain numerical stability, leads to data dependencies that prevent us from using some common performance-improving techniques. To fully utilize a large number of cores through tasking, while conforming to the OpenMP standard, we describe several techniques. Our performance results on current many-core architectures - including Intel's Broadwell, Intel's Knights Landing, IBM's Power8, and Arm's ARMv8 - demonstrate the portable and superior performance of our implementation compared with the Linear Algebra PACKage (LAPACK). The resulting solver is now available as a part of the PLASMA software package.
AB - Recently, the Open Multi-Processing (OpenMP) standard has incorporated task-based programming, where a function call with input and output data is treated as a task. At run time, OpenMP's superscalar scheduler tracks the data dependencies among the tasks and executes the tasks as their dependencies are resolved. On a shared-memory architecture with multiple cores, the independent tasks are executed on different cores in parallel, thereby enabling parallel execution of a seemingly sequential code. With the emergence of many-core architectures, this type of programming paradigm is gaining attention - not only because of its simplicity, but also because it breaks the artificial synchronization points of the program and improves its thread-level parallelization. In this paper, we use these new OpenMP features to develop a portable high-performance implementation of a dense symmetric indefinite linear solver. Obtaining high performance from this kind of solver is a challenge because the symmetric pivoting, which is required to maintain numerical stability, leads to data dependencies that prevent us from using some common performance-improving techniques. To fully utilize a large number of cores through tasking, while conforming to the OpenMP standard, we describe several techniques. Our performance results on current many-core architectures - including Intel's Broadwell, Intel's Knights Landing, IBM's Power8, and Arm's ARMv8 - demonstrate the portable and superior performance of our implementation compared with the Linear Algebra PACKage (LAPACK). The resulting solver is now available as a part of the PLASMA software package.
KW - Linear algebra
KW - Runtime
KW - multithreading
KW - symmetric indefinite matrices
UR - http://www.scopus.com/inward/record.url?scp=85042706554&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2018.2808964
DO - 10.1109/TPDS.2018.2808964
M3 - Article
AN - SCOPUS:85042706554
SN - 1045-9219
VL - 29
SP - 1879
EP - 1892
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 8
ER -