Abstract
SLATE (Software for Linear Algebra Targeting Exascale) is a distributed, dense linear algebra library targeting both CPU-only and GPU-accelerated systems, developed over the course of the Exascale Computing Project (ECP). While it began with several documents setting out its initial design, significant design changes occurred throughout its development. In some cases, these were anticipated: an early version used a simple consistency flag that was later replaced with a full-featured consistency protocol. In other cases, performance limitations and software and hardware changes prompted a redesign. Sequential communication tasks were parallelized; host-to-host MPI calls were replaced with GPU device-to-device MPI calls; more advanced algorithms such as Communication Avoiding LU and the Random Butterfly Transform (RBT) were introduced. Early choices that turned out to be cumbersome, error prone, or inflexible have been replaced with simpler, more intuitive, or more flexible designs. Applications have been a driving force, prompting a lighter weight queue class, nonuniform tile sizes, and more flexible MPI process grids. Of paramount importance has been building a portable library that works across several different GPU architectures – AMD, Intel, and NVIDIA – while keeping a clean and maintainable codebase. Here we explore the evolving design choices and their effects, both in terms of performance and software sustainability.
Original language | English |
---|---|
Journal | International Journal of High Performance Computing Applications |
DOIs | |
State | Accepted/In press - 2024 |
Funding
This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy\u2019s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation\u2019s exascale computing imperative. This work was supported in part by the National Science Foundation under the BALLISTIC project, NSF grant 2004541. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the U.S. Department of Energy; 17-SC-20-SC, National Science Foundation; 2004541.
Funders | Funder number |
---|---|
Office of Science and National Nuclear Security Administration | |
National Science Foundation | 2004541 |
Office of Science | DE-AC05-00OR22725 |
U.S. Department of Energy | 17-SC-20-SC |
Keywords
- High-performance computing
- graphics processing unit computing
- linear algebra