TY - GEN
T1 - Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures
AU - Agullo, Emmanuel
AU - Bouwmeester, Henricus
AU - Dongarra, Jack
AU - Kurzak, Jakub
AU - Langou, Julien
AU - Rosenberg, Lee
PY - 2011
Y1 - 2011
N2 - The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.
AB - The algorithms in the current sequential numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multicore architectures. A new family of algorithms, the tile algorithms, has recently been introduced. Previous research has shown that it is possible to write efficient and scalable tile algorithms for performing a Cholesky factorization, a (pseudo) LU factorization, a QR factorization, and computing the inverse of a symmetric positive definite matrix. In this extended abstract, we revisit the computation of the inverse of a symmetric positive definite matrix. We observe that, using a dynamic task scheduler, it is relatively painless to translate existing LAPACK code to obtain a ready-to-be-executed tile algorithm. However we demonstrate that, for some variants, non trivial compiler techniques (array renaming, loop reversal and pipelining) need then to be applied to further increase the parallelism of the application. We present preliminary experimental results.
UR - http://www.scopus.com/inward/record.url?scp=79952578568&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19328-6_14
DO - 10.1007/978-3-642-19328-6_14
M3 - Conference contribution
AN - SCOPUS:79952578568
SN - 9783642193279
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 129
EP - 138
BT - High Performance Computing for Computational Science, VECPAR 2010 - 9th International Conference, Revised Selected Papers
T2 - 9th International Conference on High Performance Computing for Computational Science, VECPAR 2010
Y2 - 22 June 2010 through 25 June 2010
ER -