TY - GEN
T1 - Merged requests for better performance and productivity in multithreaded OpenSHMEM
AU - Boehm, Swen
AU - Pophale, Swaroop
AU - Baker, Matthew B.
AU - Venkata, Manjunath Gorentla
N1 - Publisher Copyright:
© Springer International Publishing AG 2018.
PY - 2018
Y1 - 2018
N2 - A merged request is a handle representing a group of Remote Memory Access (RMA), Atomic or Collective operations. The merged request can be created either by combining multiple outstanding merged request handles or using the same merged request handle for additional operations. We show that introducing such simple yet powerful semantics in OpenSHMEM provides many productivity and performance advantages. In this paper, we first introduce the interfaces and semantics for creating and using merged request handles. Then, we demonstrate with a merge request that we can achieve better performance characteristics in multithreaded OpenSHMEM application. Particularly, we show one can achieve higher message rate, a higher bandwidth for smaller message, and better computation-communication overlap. Further, we use merged request to realize multithreaded collectives, where multiple threads co-operate to complete the collective operation. Our experimental results show that in a multithreaded OpenSHMEM program, the merged request based RMA operations achieve over 100 Million Messages Per Second (MMPS). It achieves over 10 MMPS compared to 4.5 MMPS with default RMA operations in a single threaded environment. Also, we achieve higher bandwidth for smaller message sizes, close to 100% overlap, and reduce the latency by 60%.
AB - A merged request is a handle representing a group of Remote Memory Access (RMA), Atomic or Collective operations. The merged request can be created either by combining multiple outstanding merged request handles or using the same merged request handle for additional operations. We show that introducing such simple yet powerful semantics in OpenSHMEM provides many productivity and performance advantages. In this paper, we first introduce the interfaces and semantics for creating and using merged request handles. Then, we demonstrate with a merge request that we can achieve better performance characteristics in multithreaded OpenSHMEM application. Particularly, we show one can achieve higher message rate, a higher bandwidth for smaller message, and better computation-communication overlap. Further, we use merged request to realize multithreaded collectives, where multiple threads co-operate to complete the collective operation. Our experimental results show that in a multithreaded OpenSHMEM program, the merged request based RMA operations achieve over 100 Million Messages Per Second (MMPS). It achieves over 10 MMPS compared to 4.5 MMPS with default RMA operations in a single threaded environment. Also, we achieve higher bandwidth for smaller message sizes, close to 100% overlap, and reduce the latency by 60%.
KW - Interoperability
KW - PGAS
KW - Shared memory
UR - http://www.scopus.com/inward/record.url?scp=85041500952&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-73814-7_3
DO - 10.1007/978-3-319-73814-7_3
M3 - Conference contribution
AN - SCOPUS:85041500952
SN - 9783319738130
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 35
EP - 49
BT - OpenSHMEM and Related Technologies
A2 - Gorentla Venkata, Manjunath
A2 - Imam, Neena
A2 - Pophale, Swaroop
PB - Springer Verlag
T2 - 4th Workshop on OpenSHMEM and Related Technologies: Big Compute and Big Data Convergence, OpenSHMEM 2017
Y2 - 7 August 2017 through 9 August 2017
ER -