A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines

Ahmad Abdelfattah, Timothy Costa, Jack Dongarra, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Mawussi Zounon

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. In particular, half precision is used in many very large scale applications, such as those associated with machine learning.

Original languageEnglish
Article number21
JournalACM Transactions on Mathematical Software
Volume47
Issue number3
DOIs
StatePublished - Jun 2021
Externally publishedYes

Funding

This material is based upon work supported in part by the National Science Foundation under Grants No. OAC 1740250 and CSR 1514286 and OAC 2004850, NVIDIA, the Department of Energy, and in part by the Russian Scientific Foundation, Agreement N14-11-00190. This project was also funded in part from the European Union’s Horizon 2020 research and innovation programme under the NLAFET grant agreement No 671633. Authors’ addresses: A. Abdelfattah, M. Gates, P. Luszczek, and S. Tomov, University of Tennessee, 1122 Volunteer Blvd., Suite 203, Knoxville, TN 37996-3450, USA; emails: [email protected], [email protected], [email protected], [email protected]; T. Costa, NVIDIA, Santa Clara; J. Dongarra, University of Tennessee, Oak Ridge National Laboratory, and University of Manchester, Knoxville; email: [email protected]; A. Haidar, NVIDIA, Knoxville; email: [email protected]; S. Hammarling and N. J. Higham, University of Manchester, Manchester, UK; emails: [email protected], [email protected]; J. Kurzak, AMD, Knoxville; M. Zounon, NAG Ltd., Manchester, UK; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Association for Computing Machinery. 0098-3500/2021/06-ART21 $15.00 https://doi.org/10.1145/3431921

FundersFunder number
National Science FoundationCSR 1514286, OAC 2004850, OAC 1740250
U.S. Department of Energy
NVIDIA
Horizon 2020 Framework Programme671633
Russian Science FoundationN14-11-00190

    Keywords

    • BLAS
    • batched BLAS

    Fingerprint

    Dive into the research topics of 'A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines'. Together they form a unique fingerprint.

    Cite this