TY - GEN
T1 - Bidiagonalization and R-Bidiagonalization
T2 - 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017
AU - Faverge, Mathieu
AU - Langou, Julien
AU - Robert, Yves
AU - Dongarra, Jack
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/30
Y1 - 2017/6/30
N2 - We study tiled algorithms for going from a 'full' matrix to a condensed 'band bidiagonal' form using orthog-onal transformations: (i) the tiled bidiagonalization algorithm BIDIAG, which is a tiled version of the standard scalar bidiago-nalization algorithm; and (ii) the R-bidiagonalization algorithm R-BIDIAG, which is a tiled version of the algorithm which consists in first performing the QR factorization of the initial matrix, then performing the band-bidiagonalization of the R-factor. For both BIDIAG and R-BIDIAG, we use four main types of reduction trees, namely FLATTS, FLATTT, GREEDY, and a newly introduced auto-adaptive tree, AUTO. We provide a study of critical path lengths for these tiled algorithms, which shows that (i) R-BIDIAG has a shorter critical path length than BIDIAG for tall and skinny matrices, and (ii) GREEDY based schemes are much better than earlier proposed algorithms with unbounded resources. We provide experiments on a single multicore node, and on a few multicore nodes of a parallel distributed shared-memory system, to show the superiority of the new algorithms on a variety of matrix sizes, matrix shapes and core counts.
AB - We study tiled algorithms for going from a 'full' matrix to a condensed 'band bidiagonal' form using orthog-onal transformations: (i) the tiled bidiagonalization algorithm BIDIAG, which is a tiled version of the standard scalar bidiago-nalization algorithm; and (ii) the R-bidiagonalization algorithm R-BIDIAG, which is a tiled version of the algorithm which consists in first performing the QR factorization of the initial matrix, then performing the band-bidiagonalization of the R-factor. For both BIDIAG and R-BIDIAG, we use four main types of reduction trees, namely FLATTS, FLATTT, GREEDY, and a newly introduced auto-adaptive tree, AUTO. We provide a study of critical path lengths for these tiled algorithms, which shows that (i) R-BIDIAG has a shorter critical path length than BIDIAG for tall and skinny matrices, and (ii) GREEDY based schemes are much better than earlier proposed algorithms with unbounded resources. We provide experiments on a single multicore node, and on a few multicore nodes of a parallel distributed shared-memory system, to show the superiority of the new algorithms on a variety of matrix sizes, matrix shapes and core counts.
KW - Bidiagonalization
KW - R-bidiagonalization
KW - auto-adaptive reduction tree
KW - critical path
KW - greedy algorithms
UR - http://www.scopus.com/inward/record.url?scp=85027684631&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2017.46
DO - 10.1109/IPDPS.2017.46
M3 - Conference contribution
AN - SCOPUS:85027684631
T3 - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
SP - 668
EP - 677
BT - Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 29 May 2017 through 2 June 2017
ER -