BLAS-3 Optimized by OmpSs Regions (LASs Library)

Pedro Valero-Lara, Sandra Catalán, Xavier Martorell, Jesús Labarta

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

In this paper we propose a set of optimizations for the BLAS-3 routines of LASs library (Linear Algebra routines on OmpSs) and perform a detailed analysis of the impact of the proposed changes in terms of performance and execution time. OmpSs allows to use regions in the dependences of the tasks. This helps not only in the programming of the algorithmic optimizations, but also in the reduction of the execution time achieved by such optimizations. Different strategies are implemented in order to reduce the amount of tasks created (when there is enough parallelism) during the execution of BLAS-3 operations in the original LASs. Also a better IPC is obtained thanks to a better memory hierarchy exploitation. More specifically, we increase the performance, in particular on big matrices, about 12% for TRSM, and 17% for GEMM with respect to the original version of LASs, even using less cores in the case of GEMM/SYMM. Moreover, when LASs is compared to the OpenMP reference dense linear algebra library PLASMA, performance is increased up to 12.5% for GEMM/SYMM, while for TRSM/TRMM this value raises to 15%.

Original languageEnglish
Title of host publicationProceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages25-32
Number of pages8
ISBN (Electronic)9781728116440
DOIs
StatePublished - Mar 19 2019
Externally publishedYes
Event27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019 - Pavia, Italy
Duration: Feb 13 2019Feb 15 2019

Publication series

NameProceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019

Conference

Conference27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
Country/TerritoryItaly
CityPavia
Period02/13/1902/15/19

Funding

ACKNOWLEDGMENT This project has received funding from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P), the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Parallels (2014-SGR-1051), and the Juan de la Cierva Grant Agreement No IJCI-2017-33511. We also acknowledge the funding provided by Fujitsu under the BSC-Fujitsu joint project: Math Libraries Migration and Optimization.

FundersFunder number
Computación de Altas Prestaciones VIITIN2015-65316-P
Spanish Ministry of Economy and Competitiveness
Generalitat de CatalunyaIJCI-2017-33511, 2014-SGR-1051
Fujitsu

    Keywords

    • BLAS-3
    • OmpSs
    • regions
    • tasking

    Fingerprint

    Dive into the research topics of 'BLAS-3 Optimized by OmpSs Regions (LASs Library)'. Together they form a unique fingerprint.

    Cite this