Abstract
Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tunable parameters with the AOMP and GCC compilers on an AMD MI100 GPU. In addition, we explain the implementations of the OpenMP reduction by the compilers. Sweeping over the pruned parameter space, we find that the speedup is approximately 20 with AOMP, and the reduction performance using AOMP is approximately 11% higher than that using GCC. However, the OpenMP offload performance is approximately 30% lower compared to the performance of the reductions written with rocThrust or hipCUB.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 496-499 |
Number of pages | 4 |
ISBN (Electronic) | 9781665497473 |
DOIs | |
State | Published - 2022 |
Event | 36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 - Virtual, Online, France Duration: May 30 2022 → Jun 3 2022 |
Publication series
Name | Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 |
---|
Conference
Conference | 36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022 |
---|---|
Country/Territory | France |
City | Virtual, Online |
Period | 05/30/22 → 06/3/22 |
Funding
ACKNOWLEDGMENT We sincerely appreciate the reviewers for their comments and suggestions. This research used resources of the Experimental Computing Lab at Oak Ridge National Laboratory. This research was supported by the US Department of Energy Advanced Scientific Computing Research program under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- AMD GPU
- OpenMP target offload
- Reduction