TY - JOUR
T1 - Performance analysis of MPI collective operations
AU - Pješivac-Grbović, Jelena
AU - Angskun, Thara
AU - Bosilca, George
AU - Fagg, Graham E.
AU - Gabriel, Edgar
AU - Dongarra, Jack J.
PY - 2007/6
Y1 - 2007/6
N2 - Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing. In this paper, we analyze and attempt to improve intra-clustercollective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary. Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.
AB - Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing. In this paper, we analyze and attempt to improve intra-clustercollective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary. Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.
KW - Hockney
KW - LogGP
KW - LogP
KW - MPI collective communication
KW - PLogP
KW - Parallel communication models
KW - Performance modeling
UR - http://www.scopus.com/inward/record.url?scp=34248676296&partnerID=8YFLogxK
U2 - 10.1007/s10586-007-0012-0
DO - 10.1007/s10586-007-0012-0
M3 - Article
AN - SCOPUS:34248676296
SN - 1386-7857
VL - 10
SP - 127
EP - 143
JO - Cluster Computing
JF - Cluster Computing
IS - 2
ER -