Variable Batched DGEMM

Pedro Valero-Lara, Ivan Martinez-Perez, Sergi Mateo, Raul Sirvent, Vicenc Beltran, Xavier Martorell, Jesus Labarta

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Many scientific applications are in need to solve a high number of small-size independent problems. These individual problems do not provide enough parallelism and then, these must be computed as a batch. Today, vendors such as Intel and NVIDIA are developing their own suite of batch routines. Although most of the works focus on computing batches of fixed size, in real applications we can not assume a uniform size for all set of problems. We explore and analyze different strategies based on parallel for, task and taskloop OpenMP pragmas. Although these strategies are straightforward from a programmer's point of view, they have a different impact on performance. We also analyze a new prototype provided by Intel (MKL), which deals with batch operations (cblas dgemm batch). We propose a new approach called grouping. It basically groups a set of problems until filling a limit in terms of memory occupancy or number of operations. In this way, groups composed by different number of problems are distributed on cores, achieving a more balanced distribution in terms of computational cost. This strategy is able to be up to 6× faster than the Intel (MKL) batch routine.

Original languageEnglish
Title of host publicationProceedings - 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2018
EditorsIgor Kotenko, Ivan Merelli, Pietro Lio
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages363-367
Number of pages5
ISBN (Electronic)9781538649756
DOIs
StatePublished - Jun 6 2018
Externally publishedYes
Event26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2018 - Cambridge, United Kingdom
Duration: Mar 21 2018Mar 23 2018

Publication series

NameProceedings - 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2018

Conference

Conference26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2018
Country/TerritoryUnited Kingdom
CityCambridge
Period03/21/1803/23/18

Funding

FundersFunder number
Horizon 2020 Framework Programme720270

    Keywords

    • Auto tunning
    • Batched BLAS
    • DGEMM
    • Intel Xeon
    • OpenMP
    • Runtime

    Fingerprint

    Dive into the research topics of 'Variable Batched DGEMM'. Together they form a unique fingerprint.

    Cite this