Abstract
In this paper we propose a set of optimizations for the BLAS-3 routines of LASs library (Linear Algebra routines on OmpSs) and perform a detailed analysis of the impact of the proposed changes in terms of performance and execution time. OmpSs allows to use regions in the dependences of the tasks. This helps not only in the programming of the algorithmic optimizations, but also in the reduction of the execution time achieved by such optimizations. Different strategies are implemented in order to reduce the amount of tasks created (when there is enough parallelism) during the execution of BLAS-3 operations in the original LASs. Also a better IPC is obtained thanks to a better memory hierarchy exploitation. More specifically, we increase the performance, in particular on big matrices, about 12% for TRSM, and 17% for GEMM with respect to the original version of LASs, even using less cores in the case of GEMM/SYMM. Moreover, when LASs is compared to the OpenMP reference dense linear algebra library PLASMA, performance is increased up to 12.5% for GEMM/SYMM, while for TRSM/TRMM this value raises to 15%.
Original language | English |
---|---|
Title of host publication | Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 25-32 |
Number of pages | 8 |
ISBN (Electronic) | 9781728116440 |
DOIs | |
State | Published - Mar 19 2019 |
Event | 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019 - Pavia, Italy Duration: Feb 13 2019 → Feb 15 2019 |
Publication series
Name | Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019 |
---|
Conference
Conference | 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019 |
---|---|
Country/Territory | Italy |
City | Pavia |
Period | 02/13/19 → 02/15/19 |
Funding
ACKNOWLEDGMENT This project has received funding from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P), the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Parallels (2014-SGR-1051), and the Juan de la Cierva Grant Agreement No IJCI-2017-33511. We also acknowledge the funding provided by Fujitsu under the BSC-Fujitsu joint project: Math Libraries Migration and Optimization.
Keywords
- BLAS-3
- OmpSs
- regions
- tasking