## Abstract

This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for multiple instruction multiple data (MIMD) distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under development. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. The use of such algorithms helps reduce the message startup costs on distributed memory concurrent computers. Other key ideas in our approach are the use of distributed versions of the Level 2 and Level 3 basic linear algebra subprograms (BLAS) as computational building blocks, and the use of basic linear algebra communication subprograms (BLACS) as communication building blocks. Together the distributed BLAS and the BLACS can be used to construct higher-level algorithms, and hide many details of the parallelism from the application developer. The block-cyclic data distribution is described and adopted as a good way of distributing block-partitioned matrices. Block-partitioned versions of the Cholesky and LU factorizations are presented, and optimization issues associated with the implementation of the LU factorization algorithm on distributed memory concurrent computers are discussed, together with its performance on the Intel Delta system. Finally, approaches to the design of library interfaces are reviewed.

Original language | English |
---|---|

Pages (from-to) | 151-180 |

Number of pages | 30 |

Journal | SIAM Review |

Volume | 37 |

Issue number | 2 |

DOIs | |

State | Published - 1995 |

Externally published | Yes |