The Design and Implementation of the Reduction Routines in ScaLAPACK

Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, Antoine P. Petitet, David W. Walker, R. Clint Whaley

Research output: Contribution to journalArticlepeer-review

Abstract

This chapter discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under development. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. The use of such algorithms helps reduce the message startup costs on distributed memory concurrent computers. Other key ideas in our approach are the use of distributed versions of the Level 3 Basic Linear Algebra Subprograms (BLAS) as computational building block, and the use of Basic Linear Algebra Communication Subgrograms (BLACS) as communication building blocks. Together the distributed BLAS and the BLACS can be used to construct higher-level, algorithms, and hide many details of the parallelism from the application developer. The block-cyclic data distribution is described, and adopted as a good way of distributing block-partitioned matrices. Block-partitioned versions of the Cholesky and LU factorizations are presented, and optimization issues associated with the implementation of the LU factorization algorithm on distributed memory concurrent computers are discussed, together with its performance on the Intel Delta system. Finally, approaches to the design of library interfaces are reviewed.

Original languageEnglish
Pages (from-to)177-202
Number of pages26
JournalAdvances in Parallel Computing
Volume10
Issue numberC
DOIs
StatePublished - Jan 1 1995

Fingerprint

Dive into the research topics of 'The Design and Implementation of the Reduction Routines in ScaLAPACK'. Together they form a unique fingerprint.

Cite this