Large-Message All-to-All Communication at Frontier Scale

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Near the full scale of exascale supercomputers, latency can dominate the cost of all-to-all communication even for very large message sizes. We describe GPU-aware all-to-all implementations designed to reduce latency for large message sizes at extreme scales, and we present their performance using 65536 tasks (8192 nodes) on the Frontier supercomputer at the Oak Ridge Leadership Computing Facility. Two implementations perform best for different ranges of message size, and all outperform the vendor-provided MPI_Alltoall. Our results show promising options for improving implementations of MPI_Alltoall_init.

Original languageEnglish
Title of host publicationProceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
PublisherAssociation for Computing Machinery, Inc
Pages461-467
Number of pages7
ISBN (Electronic)9798400718717
DOIs
StatePublished - Nov 15 2025
Event2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops - St. Louis, United States
Duration: Nov 16 2025Nov 21 2025

Publication series

NameProceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops

Conference

Conference2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops
Country/TerritoryUnited States
CitySt. Louis
Period11/16/2511/21/25

Funding

We thank the anonymous reviewers for suggestions regarding related work and performance results. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • All-to-all communication
  • GPU-aware MPI

Fingerprint

Dive into the research topics of 'Large-Message All-to-All Communication at Frontier Scale'. Together they form a unique fingerprint.

Cite this