OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

This work evaluated the use of OpenMP tasking with target GPU offloading as a potential solution for programming productivity and performance on heterogeneous systems. Also, it is proposed a new OpenMP specification to make the implementation of heterogeneous codes simpler by using OpenMP target task, which integrates both OpenMP tasking and target GPU offloading in a single OpenMP pragma. As a test case, the authors used one of the most popular and widely used Basic Linear Algebra Subprogram Level-3 routines: triangular solver (TRSM). To benefit from the heterogeneity of the current high-performance computing systems, the authors propose a different parallelization of the algorithm by using a nonuniform decomposition of the problem. This work used target GPU offloading inside OpenMP tasks to address the heterogeneity found in the hardware. This new approach can outperform the state-of-the-art algorithms, which use a uniform decomposition of the data, on both the CPU-only and hybrid CPU-GPU systems, reaching speedups of up to one order of magnitude. The performance that this approach achieves is faster than the IBM ESSL math library on CPU and competitive relative to a highly optimized heterogeneous CUDA version. One node of Oak Ridge National Laboratory’s supercomputer, Summit, was used for performance analysis.

Original languageEnglish
Title of host publicationEuro-Par 2021
Subtitle of host publicationParallel Processing Workshops - Euro-Par 2021 International Workshops, 2021, Revised Selected Papers
EditorsRicardo Chaves, Dora B. Heras, Aleksandar Ilic, Didem Unat, Rosa M. Badia, Andrea Bracciali, Patrick Diehl, Anshu Dubey, Oh Sangyoon, Stephen L. Scott, Laura Ricci
PublisherSpringer Science and Business Media Deutschland GmbH
Pages445-455
Number of pages11
ISBN (Print)9783031061554
DOIs
StatePublished - 2022
Event27th International Conference on Parallel and Distributed Computing, Euro-Par 2021 - Virtual, Online
Duration: Aug 30 2021Aug 31 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13098 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Parallel and Distributed Computing, Euro-Par 2021
CityVirtual, Online
Period08/30/2108/31/21

Funding

Keywords: Tasking · Heterogeneity · OpenMP · CUDA · Linear algebra · TRSM · BLAS Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy. gov/downloads/doe-public-access-plan).

Keywords

  • BLAS
  • CUDA
  • Heterogeneity
  • Linear algebra
  • OpenMP
  • TRSM
  • Tasking

Fingerprint

Dive into the research topics of 'OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems'. Together they form a unique fingerprint.

Cite this