TY - GEN
T1 - Network-friendly one-sided communication through multinode cooperation on petascale cray XT5 systems
AU - Que, Xinyu
AU - Yu, Weikuan
AU - Tipparaju, Vinod
AU - Vetter, Jeffrey S.
AU - Wang, Bin
PY - 2011
Y1 - 2011
N2 - One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically realized through direct messages between initiator and target processes. For peta scale systems with 10,000s of nodes and 100,000s of cores, these direct messages require dedicated communication buffers and/or channels, which can lead to significant scalability challenges for GAS programming models. In this paper, we describe a network-friendly communication model, multinode cooperation, to enable indirect one-sided communication. Compute nodes work together to handle one-side requests through (1) request forwarding in which one node can intercept a request and forward it to a target node, and (2) request aggregation in which one node can aggregate many requests to a target node. We have implemented multinode cooperation for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). Our experimental results on a large scale Cray XT5 system demonstrate that multinode cooperationis able to greatly increase memory scalability by reducing communication buffers required on each node. In addition, multinode cooperation improves the resiliency of GAS runtime system to network contention. Furthermore, multinode cooperation can benefit the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52.
AB - One-sided communication is important to enable asynchronous communication and data movement for Global Address Space (GAS) programming models. Such communication is typically realized through direct messages between initiator and target processes. For peta scale systems with 10,000s of nodes and 100,000s of cores, these direct messages require dedicated communication buffers and/or channels, which can lead to significant scalability challenges for GAS programming models. In this paper, we describe a network-friendly communication model, multinode cooperation, to enable indirect one-sided communication. Compute nodes work together to handle one-side requests through (1) request forwarding in which one node can intercept a request and forward it to a target node, and (2) request aggregation in which one node can aggregate many requests to a target node. We have implemented multinode cooperation for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). Our experimental results on a large scale Cray XT5 system demonstrate that multinode cooperationis able to greatly increase memory scalability by reducing communication buffers required on each node. In addition, multinode cooperation improves the resiliency of GAS runtime system to network contention. Furthermore, multinode cooperation can benefit the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52.
KW - ARMCI
KW - GAS
KW - Multinode Cooperation
KW - Request Aggregation
UR - http://www.scopus.com/inward/record.url?scp=79961139177&partnerID=8YFLogxK
U2 - 10.1109/CCGrid.2011.62
DO - 10.1109/CCGrid.2011.62
M3 - Conference contribution
AN - SCOPUS:79961139177
SN - 9780769543956
T3 - Proceedings - 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011
SP - 352
EP - 361
BT - Proceedings - 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011
T2 - 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011
Y2 - 23 May 2011 through 26 May 2011
ER -