Juggler: A Dependence-Aware Task-Based Execution Framework for GPUs

Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Laxmi N. Bhuyan

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

2 Scopus citations

Abstract

Scientific applications with single instruction, multiple data (SIMD) computations show considerable performance improvements when run on today's graphics processing units (GPUs). However, the existence of data dependences across thread blocks may significantly impact the speedup by requiring global synchronization across multiprocessors (SMs) inside the GPU. To efficiently run applications with interblock data dependences, we need fine-granular task-based execution models that will treat SMs inside a GPU as stand-alone parallel processing units. Such a scheme will enable faster execution by utilizing all internal computation elements inside the GPU and eliminating unnecessary waits during device-wide global barriers. In this paper, we propose Juggler, a task-based execution scheme for GPU workloads with data dependences. The Juggler framework takes applications embedding OpenMP 4.5 tasks as input and executes them on the GPU via an efficient in-device runtime, hence eliminating the need for kernel-wide global synchronization. Juggler requires no or little modification to the source code, and once launched, the runtime entirely runs on the GPU without relying on the host through the entire execution. We have evaluated Juggler on an NVIDIA Tesla P100 GPU and obtained up to 31% performance improvement against global barrier based implementation, with minimal runtime overhead.

Original languageEnglish
Title of host publicationACM SIGPLAN Notices
PublisherAssociation for Computing Machinery
Pages54-67
Number of pages14
Volume53
Edition1
ISBN (Electronic)9781450349116
DOIs
StatePublished - Feb 10 2018

Bibliographical note

Publisher Copyright:
© 2018 ACM.

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under contract number DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan. This work was also partially supported by NSF Grants CCF-1423108, and CCF-1513201 given to University of California, Riverside. We thank Matt Martineau and Simon McIntosh-Smith from University of Bristol for providing their code for some of the OpenMP based kernels.

FundersFunder number
U.S. Department of Energy Office of Science
National Science FoundationCCF-1423108, CCF-1513201
U.S. Department of Energy
Office of Science
National Nuclear Security Administration
Advanced Scientific Computing ResearchDE-AC05-00OR22725

    Keywords

    • GP-GPU programming
    • data dependence
    • openMP 4.5
    • task-based execution

    Fingerprint

    Dive into the research topics of 'Juggler: A Dependence-Aware Task-Based Execution Framework for GPUs'. Together they form a unique fingerprint.

    Cite this