Efficient primitives for standard tensor linear algebra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper presents the design and implementation of lowlevel library to compute general sums and products over multi-dimensional arrays (tensors). Using only 3 low-level functions, the API at once generalizes core BLAS1-3 as well as eliminates the need for most tensor transpositions. Despite their relatively low operation count, we show that these transposition steps can become performance limiting in typical use cases for BLAS on tensors. The execution of the present API achieves peak performance on the same order of magnitude as for vendor-optimized GEMM by utilizing a code generator to output CUDA source code for all computational kernels. The outline for these kernels is a multi-dimensional generalization of the MAGMA BLAS matrix multiplication on GPUs. Separate transpositions steps can be skipped because every kernel allows arbitrary multidimensional transpositions of the arguments. The library, including its methodology and programming techniques, are made available in SLACK. Future improvements to the library include a high-level interface to translate directly from a LATEX-like equation syntax to a data-parallel computation.

Original languageEnglish
Title of host publicationProceedings of XSEDE 2016
Subtitle of host publicationDiversity, Big Data, and Science at Scale
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450347556
DOIs
StatePublished - Jul 17 2016
Externally publishedYes
EventConference on Diversity, Big Data, and Science at Scale, XSEDE 2016 - Miami, United States
Duration: Jul 17 2016Jul 21 2016

Publication series

NameACM International Conference Proceeding Series
Volume17-21-July-2016

Conference

ConferenceConference on Diversity, Big Data, and Science at Scale, XSEDE 2016
Country/TerritoryUnited States
CityMiami
Period07/17/1607/21/16

Funding

FundersFunder number
National Science Foundation1531590

    Keywords

    • Blas
    • GPU
    • Tensor
    • Transposition

    Fingerprint

    Dive into the research topics of 'Efficient primitives for standard tensor linear algebra'. Together they form a unique fingerprint.

    Cite this