Performance analysis of MPI collective operations

Jelena Pješivac-Grbović, Thara Angskun, George Bosilca, Graham E. Fagg, Edgar Gabriel, Jack J. Dongarra

Research output: Contribution to journalArticlepeer-review

127 Scopus citations

Abstract

Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing. In this paper, we analyze and attempt to improve intra-clustercollective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary. Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.

Original languageEnglish
Pages (from-to)127-143
Number of pages17
JournalCluster Computing
Volume10
Issue number2
DOIs
StatePublished - Jun 2007
Externally publishedYes

Keywords

  • Hockney
  • LogGP
  • LogP
  • MPI collective communication
  • PLogP
  • Parallel communication models
  • Performance modeling

Fingerprint

Dive into the research topics of 'Performance analysis of MPI collective operations'. Together they form a unique fingerprint.

Cite this