TY - GEN
T1 - Exploring OpenSHMEM model to program GPU-based extreme-scale systems
AU - Potluri, Sreeram
AU - Rossetti, Davide
AU - Becker, Donald
AU - Poole, Duncan
AU - Venkata, Manjunath Gorentla
AU - Hernandez, Oscar
AU - Shamis, Pavel
AU - Graham Lopez, M.
AU - Baker, Mathew
AU - Poole, Wendy
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Extreme-scale systems with compute accelerators such as Graphical Processing Unit (GPUs) have become popular for executing scientific applications. These systems are typically programmed using MPI and CUDA (for NVIDIA based GPUs). However, there are many drawbacks to the MPI+CUDA approach. The orchestration required between the compute and communication phases of the application execution, and the constraint that communication can only be initiated from serial portions on the Central Processing Unit (CPU) lead to scaling bottlenecks. To address these drawbacks, we explore the viability of using OpenSHMEMfor programming these systems. In this paper, first, we make a case for supporting GPU-initiated communication, and suitability of the OpenSHMEMprogramming model. Second, we present NVSHMEM, a prototype implementation of the proposed programming approach, port Stencil and Transpose benchmarks which are representative of many scientific applications from MPI+CUDA model to Open-SHMEM, and evaluate the design and implementation of NVSHMEM. Finally, we provide a discussion on the opportunities and challenges of OpenSHMEMto program these systems, and propose extensions to Open-SHMEMto achieve the full potential of this programming approach.
AB - Extreme-scale systems with compute accelerators such as Graphical Processing Unit (GPUs) have become popular for executing scientific applications. These systems are typically programmed using MPI and CUDA (for NVIDIA based GPUs). However, there are many drawbacks to the MPI+CUDA approach. The orchestration required between the compute and communication phases of the application execution, and the constraint that communication can only be initiated from serial portions on the Central Processing Unit (CPU) lead to scaling bottlenecks. To address these drawbacks, we explore the viability of using OpenSHMEMfor programming these systems. In this paper, first, we make a case for supporting GPU-initiated communication, and suitability of the OpenSHMEMprogramming model. Second, we present NVSHMEM, a prototype implementation of the proposed programming approach, port Stencil and Transpose benchmarks which are representative of many scientific applications from MPI+CUDA model to Open-SHMEM, and evaluate the design and implementation of NVSHMEM. Finally, we provide a discussion on the opportunities and challenges of OpenSHMEMto program these systems, and propose extensions to Open-SHMEMto achieve the full potential of this programming approach.
UR - http://www.scopus.com/inward/record.url?scp=84952328130&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-26428-8_2
DO - 10.1007/978-3-319-26428-8_2
M3 - Conference contribution
AN - SCOPUS:84952328130
SN - 9783319264271
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 18
EP - 35
BT - OpenSHMEM and Related Technologies
A2 - Venkata, Manjunath Gorentla
A2 - Shamis, Pavel
A2 - Imam, Neena
A2 - Lopez, M. Graham
PB - Springer Verlag
T2 - 2nd Workshop on OpenSHMEM and Related Technologies, OpenSHMEM 2015
Y2 - 4 August 2015 through 6 August 2015
ER -