Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era

Mark Gates, Asim Yarkhan, Dalal Sukkari, Kadir Akbudak, Sebastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Mohammed Al Farhan, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The SLATE project is implementing a distributed dense linear algebra library for highly-scalable distributed-memory accelerator-based computer systems. The goal is to provide a library that can be easily ported to different hardware (CPUs, GPUs, accelerators) and will provide high performance for machines into the future. Current ports include CPUs, CUDA, ROCm, and oneAPI. We achieve both performance and portability by leveraging several layers and abstractions, including OpenMP tasks to track data dependencies, MPI for distributed communication, and the BLAS++ and LAPACK++ libraries developed as a portable layer across vendor-optimized CPU and GPU BLAS and LAPACK functionality. We rely on the C++ standard library and templating to reduce code duplication for better maintainability. The few kernels not present in BLAS are implemented in CUDA, HIP, and OpenMP target offload, and are easily ported to new platforms.

Original languageEnglish
Title of host publicationProceedings of P3HPC 2022
Subtitle of host publication2022 International Workshop on Performance, Portability and Productivity in HPC, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages36-46
Number of pages11
ISBN (Electronic)9781665460217
DOIs
StatePublished - 2022
Externally publishedYes
Event5th IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC, P3HPC 2022 - Dallas, United States
Duration: Nov 13 2022Nov 18 2022

Publication series

NameProceedings of P3HPC 2022: 2022 International Workshop on Performance, Portability and Productivity in HPC, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference5th IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC, P3HPC 2022
Country/TerritoryUnited States
CityDallas
Period11/13/2211/18/22

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • GPU computing
  • distributed computing
  • numerical linear algebra

Fingerprint

Dive into the research topics of 'Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era'. Together they form a unique fingerprint.

Cite this