TY - JOUR
T1 - The design of a parallel dense linear algebra software library
T2 - Reduction to Hessenberg, tridiagonal, and bidiagonal form
AU - Choi, Jaeyoung
AU - Dongarra, Jack J.
AU - Walker, David W.
PY - 1995/9
Y1 - 1995/9
N2 - This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. These routines are important in the solution of eigenproblems. The paper focuses on how building blocks are used to create higher-level library routines. Results are presented that demonstrate the scalability of the reduction routines. The most commonly-used building blocks used in ScaLAPACK are the sequencing BLAS, the parallel BLAS (PBLAS) and the Basic Linear Algebra Communication Subprograms (BLACS). Each of the matrix reduction algorithms consists of a series of steps in each of which one block column (or panel), and/or block row, of the matrix is reduced, followed by an update of the portion of the matrix that has not been factorized so far. This latter phase is performed using Level 3 PBLAS operations and contains the bulk of the computation. However, the panel reduction phase involves a significant amount of communication, and is important in determining the scalability of the algorithm. The simplest way to parallelize the panel reduction phase is to replace the BLAS routines appearing in the LAPACK routine (mostly matrix-vector and matrix-matrix multiplications) with the corresponding PBLAS routines. However, in some cases it is possible to reduce communication startup costs by performing the communication necessary for consecutive BLAS operations in a single communication using a BLACS call. Thus, there is a tradeoff between efficiency and software engineering considerations, such as ease of programming and simplicity of code.
AB - This paper discusses issues in the design of ScaLAPACK, a software library for performing dense linear algebra computations on distributed memory concurrent computers. These issues are illustrated using the ScaLAPACK routines for reducing matrices to Hessenberg, tridiagonal, and bidiagonal forms. These routines are important in the solution of eigenproblems. The paper focuses on how building blocks are used to create higher-level library routines. Results are presented that demonstrate the scalability of the reduction routines. The most commonly-used building blocks used in ScaLAPACK are the sequencing BLAS, the parallel BLAS (PBLAS) and the Basic Linear Algebra Communication Subprograms (BLACS). Each of the matrix reduction algorithms consists of a series of steps in each of which one block column (or panel), and/or block row, of the matrix is reduced, followed by an update of the portion of the matrix that has not been factorized so far. This latter phase is performed using Level 3 PBLAS operations and contains the bulk of the computation. However, the panel reduction phase involves a significant amount of communication, and is important in determining the scalability of the algorithm. The simplest way to parallelize the panel reduction phase is to replace the BLAS routines appearing in the LAPACK routine (mostly matrix-vector and matrix-matrix multiplications) with the corresponding PBLAS routines. However, in some cases it is possible to reduce communication startup costs by performing the communication necessary for consecutive BLAS operations in a single communication using a BLACS call. Thus, there is a tradeoff between efficiency and software engineering considerations, such as ease of programming and simplicity of code.
UR - http://www.scopus.com/inward/record.url?scp=0005998639&partnerID=8YFLogxK
U2 - 10.1007/BF02140776
DO - 10.1007/BF02140776
M3 - Article
AN - SCOPUS:0005998639
SN - 1017-1398
VL - 10
SP - 379
EP - 399
JO - Numerical Algorithms
JF - Numerical Algorithms
IS - 2
ER -