TY - GEN
T1 - Using hybrid model OpenSHMEM+CUDA to implement the SHOC benchmark suite
AU - Grodowitz, Megan
AU - D’Azevedo, Eduardo
AU - Powers, Sarah
AU - Imam, Neena
N1 - Publisher Copyright:
© Springer International Publishing AG 2016.
PY - 2016
Y1 - 2016
N2 - This work describes the process of porting the Scalable HeterOgeneous Computing (SHOC) benchmark suite from the hybrid MPI+CUDA implementation to OpenSHMEM+CUDA. SHOC includes a wide variety of benchmark kernels used to measure accelerator performance in both single node and cluster configurations. The hybrid model implementation attempts to place all major computation on accelerator devices, and uses MPI to synchronize and aggregate results. In some cases, MPI Groups are used to gradually reduce the number of accelerators used for computation as the problem size drops. Porting this behavior to OpenSHMEM required implementing several synchronizing collective operations, and using SHMEM teams to replace MPI Group functionality. Benchmark results on a Cray XK7 system with one GPU per compute node show that SHMEM performance is equal to MPI performance in these hybrid tasks. These results and porting experience show that using OpenSHMEM for accelerator devices benefits from adding functionality for synchronization and teams, and would further benefit from adding support for communication within accelerator kernels. (Notice: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE- AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. De-AC05- 00OR22725.)
AB - This work describes the process of porting the Scalable HeterOgeneous Computing (SHOC) benchmark suite from the hybrid MPI+CUDA implementation to OpenSHMEM+CUDA. SHOC includes a wide variety of benchmark kernels used to measure accelerator performance in both single node and cluster configurations. The hybrid model implementation attempts to place all major computation on accelerator devices, and uses MPI to synchronize and aggregate results. In some cases, MPI Groups are used to gradually reduce the number of accelerators used for computation as the problem size drops. Porting this behavior to OpenSHMEM required implementing several synchronizing collective operations, and using SHMEM teams to replace MPI Group functionality. Benchmark results on a Cray XK7 system with one GPU per compute node show that SHMEM performance is equal to MPI performance in these hybrid tasks. These results and porting experience show that using OpenSHMEM for accelerator devices benefits from adding functionality for synchronization and teams, and would further benefit from adding support for communication within accelerator kernels. (Notice: This manuscript has been authored by UT-Battelle, LLC under Contract No. DE- AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. De-AC05- 00OR22725.)
KW - CUDA
KW - Parallel computing
KW - Programming models
KW - SHMEM
UR - http://www.scopus.com/inward/record.url?scp=85009452986&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-50995-2_14
DO - 10.1007/978-3-319-50995-2_14
M3 - Conference contribution
AN - SCOPUS:85009452986
SN - 9783319509945
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 204
EP - 216
BT - OpenSHMEM and Related Technologies
A2 - Venkata, Manjunath Gorentla
A2 - Imam, Neena
A2 - Pophale, Swaroop
A2 - Mintz, Tiffany M.
PB - Springer Verlag
T2 - 3rd workshop on OpenSHMEM and Related Technologies, OpenSHMEM 2016
Y2 - 2 August 2016 through 4 August 2016
ER -