TY - GEN
T1 - ScaLAPACK tutorial
AU - Dongarra, Jack
AU - Petitet, Antoine
N1 - Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 1996.
PY - 1996
Y1 - 1996
N2 - This ScaLAPACK tutorial begins with a brief description of the LAPACK library. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. By relying on the Basic Linear Algebra Subprograms (BLAS) it is possible to develop portable and efficient implementations of these algorithms across a wide range of architectures, with emphasis on workstations, vector-processors and shared-memory computers, as has been done in LAPACK. The ScaLAPACK library, which is a distributed memory version of LAPACK is then presented. A key idea in our approach is the use of Basic Linear Algebra Communication Subprograms (BLACS) as communication building blocks and the use of a distributed version of the BLAS, the Parallel Basic Linear Algebra Subprograms (PBLAS) as computational building blocks. The BLACS and PBLAS features are in turn outlined and it is shown how these building blocks can be used to construct higher-level algorithms, and hide many details of the parallelism from the application developer. Performance results of ScaLAPACK routines are presented validating the adoption of the block-cyclic decomposition scheme as a way of distributing block-partitioned matrices yielding to well balanced computations and scalable implementations. Finally, future directions for the ScaLAPACK library are described and alternative approaches to mathematical libraries are suggested that could integrate ScaLAPACK into efficient and user-friendly distributed systems.
AB - This ScaLAPACK tutorial begins with a brief description of the LAPACK library. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. By relying on the Basic Linear Algebra Subprograms (BLAS) it is possible to develop portable and efficient implementations of these algorithms across a wide range of architectures, with emphasis on workstations, vector-processors and shared-memory computers, as has been done in LAPACK. The ScaLAPACK library, which is a distributed memory version of LAPACK is then presented. A key idea in our approach is the use of Basic Linear Algebra Communication Subprograms (BLACS) as communication building blocks and the use of a distributed version of the BLAS, the Parallel Basic Linear Algebra Subprograms (PBLAS) as computational building blocks. The BLACS and PBLAS features are in turn outlined and it is shown how these building blocks can be used to construct higher-level algorithms, and hide many details of the parallelism from the application developer. Performance results of ScaLAPACK routines are presented validating the adoption of the block-cyclic decomposition scheme as a way of distributing block-partitioned matrices yielding to well balanced computations and scalable implementations. Finally, future directions for the ScaLAPACK library are described and alternative approaches to mathematical libraries are suggested that could integrate ScaLAPACK into efficient and user-friendly distributed systems.
UR - http://www.scopus.com/inward/record.url?scp=33144486726&partnerID=8YFLogxK
U2 - 10.1007/3-540-60902-4_20
DO - 10.1007/3-540-60902-4_20
M3 - Conference contribution
AN - SCOPUS:33144486726
SN - 3540609024
SN - 9783540609025
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 166
EP - 176
BT - Applied Parallel Computing
A2 - Dongarra, Jack
A2 - Madsen, Kaj
A2 - Wasniewśki, Jerzy
PB - Springer Verlag
T2 - 2nd International Workshop on Applied Parallel Computing in Computations in Physics, Chemistry and Engineering Science, PARA 1995
Y2 - 21 August 1995 through 24 August 1995
ER -