TY - GEN
T1 - Tile QR factorization with parallel panel processing for multicore architectures
AU - Hadri, Bilel
AU - Ltaief, Hatem
AU - Agullo, Emmanuel
AU - Dongarra, Jack
PY - 2010
Y1 - 2010
N2 - To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully asynchronous method for computing a QR factorization on shared-memory multicore architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named Communication-Avoiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) library.
AB - To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully asynchronous method for computing a QR factorization on shared-memory multicore architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named Communication-Avoiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) library.
KW - Communication avoiding
KW - Dynamic scheduling
KW - Multicore
KW - QR factorization
KW - Tile algorithms
UR - http://www.scopus.com/inward/record.url?scp=77953977688&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2010.5470443
DO - 10.1109/IPDPS.2010.5470443
M3 - Conference contribution
AN - SCOPUS:77953977688
SN - 9781424464432
T3 - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010
BT - Proceedings of the 2010 IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010
T2 - 24th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2010
Y2 - 19 April 2010 through 23 April 2010
ER -