TY - GEN
T1 - Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs
AU - Davis, Joshua Hoke
AU - Daley, Christopher
AU - Pophale, Swaroop
AU - Huber, Thomas
AU - Chandrasekaran, Sunita
AU - Wright, Nicholas J.
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today’s systems to tomorrow’s. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. This work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other; a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC’s Cori system when using the Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when using the Cray-llvm compiler on Cori.
AB - Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today’s systems to tomorrow’s. Over the past decade and more, directives have been established as one of the promising paths to tackle programmatic challenges on emerging systems. This work focuses on applying and demonstrating OpenMP offloading directives on five proxy applications. We observe that the performance varies widely from one compiler to the other; a crucial aspect of our work is reporting best practices to application developers who use OpenMP offloading compilers. While some issues can be worked around by the developer, there are other issues that must be reported to the compiler vendors. By restructuring OpenMP offloading directives, we gain an 18x speedup for the su3 proxy application on NERSC’s Cori system when using the Clang compiler, and a 15.7x speedup by switching max reductions to add reductions in the laplace mini-app when using the Cray-llvm compiler on Cori.
KW - Directive-based programming
KW - GPU
KW - Heterogeneous systems
KW - NVIDIA
KW - OpenMP
KW - Performance portability
KW - V100
UR - http://www.scopus.com/inward/record.url?scp=85105965004&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-74224-9_2
DO - 10.1007/978-3-030-74224-9_2
M3 - Conference contribution
AN - SCOPUS:85105965004
SN - 9783030742232
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 25
EP - 44
BT - Accelerator Programming Using Directives - 7th International Workshop, WACCPD 2020, Proceedings
A2 - Bhalachandra, Sridutt
A2 - Wienke, Sandra
A2 - Chandrasekaran, Sunita
A2 - Juckeland, Guido
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th International Workshop on Accelerator Programming using Directives, WACCPD 2020
Y2 - 20 November 2020 through 20 November 2020
ER -