TY - GEN
T1 - Virtual topologies for scalable resource management and contention attenuation in a Global Address Space model on the Cray XT5
AU - Yu, Weikuan
AU - Tipparaju, Vinod
AU - Que, Xinyu
AU - Vetter, Jeffrey S.
PY - 2011
Y1 - 2011
N2 - Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model, and support completely asynchronous data movement. Their underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. In this research, we investigate these challenges for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI) on, large-scale Cray XT5 systems. We represent the management of communication resources as directed graphs, and propose two new scalable virtual topologies, Meshed Fully Connected Graphs (MFCG) and Cubic Fully Connected Graphs (CFCG), for scalable resource management and contention attenuation. To ensure deadlock-free communication in these multi-dimensional topologies, we design and develop lowest dimension first forwarding to support fullyor partially-populated MFCG and CFCG on any number of nodes.We have extensively evaluated the benefits of these virtual topologies on the petascale Jaguar Cray XT5 system at Oak Ridge National Laboratory. Our experimental results demonstrate MFCG as the most suitable virtual topology because of its benefits in resource management, contention mitigation, and the resulting benefit to scientific applications.
AB - Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model, and support completely asynchronous data movement. Their underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. In this research, we investigate these challenges for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI) on, large-scale Cray XT5 systems. We represent the management of communication resources as directed graphs, and propose two new scalable virtual topologies, Meshed Fully Connected Graphs (MFCG) and Cubic Fully Connected Graphs (CFCG), for scalable resource management and contention attenuation. To ensure deadlock-free communication in these multi-dimensional topologies, we design and develop lowest dimension first forwarding to support fullyor partially-populated MFCG and CFCG on any number of nodes.We have extensively evaluated the benefits of these virtual topologies on the petascale Jaguar Cray XT5 system at Oak Ridge National Laboratory. Our experimental results demonstrate MFCG as the most suitable virtual topology because of its benefits in resource management, contention mitigation, and the resulting benefit to scientific applications.
KW - ARMCI
KW - Contention
KW - GAS
KW - Virtual topology
UR - http://www.scopus.com/inward/record.url?scp=80155183417&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2011.38
DO - 10.1109/ICPP.2011.38
M3 - Conference contribution
AN - SCOPUS:80155183417
SN - 9780769545103
T3 - Proceedings of the International Conference on Parallel Processing
SP - 235
EP - 244
BT - Proceedings - 2011 International Conference on Parallel Processing, ICPP 2011
T2 - 40th International Conference on Parallel Processing, ICPP 2011
Y2 - 13 September 2011 through 16 September 2011
ER -