High-performance Matrix-Matrix multiplications of very small matrices

Ian Masliah, Ahmad Abdelfattah, A. Haidar, S. Tomov, Marc Baboulin, J. Falcou, J. Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

40 Scopus citations

Abstract

The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for obtaining high performance in many scientific computing applications. GEMMs for small matrices (of sizes less than 32) however, are not sufficiently optimized in existing libraries. In this paper we consider the case of many small GEMMs on either CPU or GPU architectures. This is a case that often occurs in applications like big data analytics, machine learning, high-order FEM, and others. The GEMMs are grouped together in a single batched routine. We present specialized for these cases algorithms and optimization techniques to obtain performance that is within 90% of the optimal. We show that these results outperform currently available state-of-the-art implementations and vendor-tuned math libraries.

Original languageEnglish
Title of host publicationParallel Processing - 22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016, Proceedings
EditorsPierre-François Dutot, Denis Trystram
PublisherSpringer Verlag
Pages659-671
Number of pages13
ISBN (Print)9783319436586
DOIs
StatePublished - 2016
Externally publishedYes
Event22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016 - Grenoble, France
Duration: Aug 24 2016Aug 26 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9833 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016
Country/TerritoryFrance
CityGrenoble
Period08/24/1608/26/16

Funding

This material is based in part upon work supported by the US NSF under Grants No. CSR 1514286 and ACI-1339822, NVIDIA, the Department of Energy, and in part by the Russian Scientific Foundation, Agreement N14-11-00190.

FundersFunder number
National Science FoundationCSR 1514286, ACI-1339822
U.S. Department of Energy
NVIDIA
Russian Science FoundationN14-11-00190

    Keywords

    • Autotuning
    • Batched GEMM
    • GEMM
    • HPC
    • Small matrices

    Fingerprint

    Dive into the research topics of 'High-performance Matrix-Matrix multiplications of very small matrices'. Together they form a unique fingerprint.

    Cite this