Abstract
This work evaluated the use of OpenMP tasking with target GPU offloading as a potential solution for programming productivity and performance on heterogeneous systems. Also, it is proposed a new OpenMP specification to make the implementation of heterogeneous codes simpler by using OpenMP target task, which integrates both OpenMP tasking and target GPU offloading in a single OpenMP pragma. As a test case, the authors used one of the most popular and widely used Basic Linear Algebra Subprogram Level-3 routines: triangular solver (TRSM). To benefit from the heterogeneity of the current high-performance computing systems, the authors propose a different parallelization of the algorithm by using a nonuniform decomposition of the problem. This work used target GPU offloading inside OpenMP tasks to address the heterogeneity found in the hardware. This new approach can outperform the state-of-the-art algorithms, which use a uniform decomposition of the data, on both the CPU-only and hybrid CPU-GPU systems, reaching speedups of up to one order of magnitude. The performance that this approach achieves is faster than the IBM ESSL math library on CPU and competitive relative to a highly optimized heterogeneous CUDA version. One node of Oak Ridge National Laboratory’s supercomputer, Summit, was used for performance analysis.
Original language | English |
---|---|
Title of host publication | Euro-Par 2021 |
Subtitle of host publication | Parallel Processing Workshops - Euro-Par 2021 International Workshops, 2021, Revised Selected Papers |
Editors | Ricardo Chaves, Dora B. Heras, Aleksandar Ilic, Didem Unat, Rosa M. Badia, Andrea Bracciali, Patrick Diehl, Anshu Dubey, Oh Sangyoon, Stephen L. Scott, Laura Ricci |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 445-455 |
Number of pages | 11 |
ISBN (Print) | 9783031061554 |
DOIs | |
State | Published - 2022 |
Event | 27th International Conference on Parallel and Distributed Computing, Euro-Par 2021 - Virtual, Online Duration: Aug 30 2021 → Aug 31 2021 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13098 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 27th International Conference on Parallel and Distributed Computing, Euro-Par 2021 |
---|---|
City | Virtual, Online |
Period | 08/30/21 → 08/31/21 |
Funding
Keywords: Tasking · Heterogeneity · OpenMP · CUDA · Linear algebra · TRSM · BLAS Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy. gov/downloads/doe-public-access-plan).
Keywords
- BLAS
- CUDA
- Heterogeneity
- Linear algebra
- OpenMP
- TRSM
- Tasking