Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs

Joshua Hoke Davis, Christopher Daley, Swaroop Pophale, Thomas Huber, Sunita Chandrasekaran, Nicholas J. Wright

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today’s systems to tomorrow’s. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. This work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other; a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC’s Cori system when using the Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when using the Cray-llvm compiler on Cori.

Original languageEnglish
Title of host publicationAccelerator Programming Using Directives - 7th International Workshop, WACCPD 2020, Proceedings
EditorsSridutt Bhalachandra, Sandra Wienke, Sunita Chandrasekaran, Guido Juckeland
PublisherSpringer Science and Business Media Deutschland GmbH
Pages25-44
Number of pages20
ISBN (Print)9783030742232
DOIs
StatePublished - 2021
Event7th International Workshop on Accelerator Programming using Directives, WACCPD 2020 - Virtual, Online
Duration: Nov 20 2020Nov 20 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12655 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Workshop on Accelerator Programming using Directives, WACCPD 2020
CityVirtual, Online
Period11/20/2011/20/20

Funding

Acknowledgements. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. This research also used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. The authors would like to thank Doug Doerfler and Rahul Gayatri for helpful discussion about the su3 benchmark and useful research directions for this project.

FundersFunder number
U.S. Department of EnergyDE-AC02-05CH11231
Office of ScienceDE-AC05-00OR22725

    Keywords

    • Directive-based programming
    • GPU
    • Heterogeneous systems
    • NVIDIA
    • OpenMP
    • Performance portability
    • V100

    Fingerprint

    Dive into the research topics of 'Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs'. Together they form a unique fingerprint.

    Cite this