TY - GEN
T1 - Reducing connection memory requirements of MPI for InfiniBand clusters
T2 - 7th IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2007
AU - Koop, Matthew J.
AU - Jones, Terry
AU - Panda, Dhabaleswar K.
PY - 2007
Y1 - 2007
N2 - Clusters in the area of high-performance computing have been growing in size at a considerable rate. In these clusters, the dominate programming model is the Message Passing Interface (MPI), so the MPI library has a key role in resource usage and performance. To obtain maximal performance, many clusters deploy a high-speed interconnect between compute nodes. One such interconnect, InfiniBand, has been gaining in popularity due to its various features including Remote Data Memory Access (RDMA), and high-performance. As a result, it is being deployed in a significant number of clusters and has been chosen as the standard interconnect for capacity clusters within the DOE Tri-Labs. As these clusters grow in size, care must be taken to ensure the resource usage does not increase too significantly with scale. In particular, the MPI library resource usage should not grow at a rate which will exhaust the node memory or starve user applications. In this paper we present our findings of current memory usage when all connections are created and design a message coalescing method to decrease memory usage significantly. Our models show that the default configuration of MVAPICH can grow to IGB per process for 8K processes, while our enhancements reduce usage by an order of magnitude to around 120 MB per process while maintaining near-equal performance. We have validated our design on a 575-node cluster and shown no performance degradation for a variety of applications. We also increase the message rate attainable by over 150%.
AB - Clusters in the area of high-performance computing have been growing in size at a considerable rate. In these clusters, the dominate programming model is the Message Passing Interface (MPI), so the MPI library has a key role in resource usage and performance. To obtain maximal performance, many clusters deploy a high-speed interconnect between compute nodes. One such interconnect, InfiniBand, has been gaining in popularity due to its various features including Remote Data Memory Access (RDMA), and high-performance. As a result, it is being deployed in a significant number of clusters and has been chosen as the standard interconnect for capacity clusters within the DOE Tri-Labs. As these clusters grow in size, care must be taken to ensure the resource usage does not increase too significantly with scale. In particular, the MPI library resource usage should not grow at a rate which will exhaust the node memory or starve user applications. In this paper we present our findings of current memory usage when all connections are created and design a message coalescing method to decrease memory usage significantly. Our models show that the default configuration of MVAPICH can grow to IGB per process for 8K processes, while our enhancements reduce usage by an order of magnitude to around 120 MB per process while maintaining near-equal performance. We have validated our design on a 575-node cluster and shown no performance degradation for a variety of applications. We also increase the message rate attainable by over 150%.
UR - http://www.scopus.com/inward/record.url?scp=34548312269&partnerID=8YFLogxK
U2 - 10.1109/CCGRID.2007.92
DO - 10.1109/CCGRID.2007.92
M3 - Conference contribution
AN - SCOPUS:34548312269
SN - 0769528333
SN - 9780769528335
T3 - Proceedings - Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2007
SP - 495
EP - 502
BT - Proceedings - Seventh IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2007
Y2 - 14 May 2007 through 17 May 2007
ER -