Performance Portability of Programming Strategies for Nearest-Neighbor Communication with GPU-Aware MPI

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

To better advise HPC application developers, we have implemented Faces, a nearest-neighbor microbenchmark that quantifies performance trade-offs. The Faces experiments presented here explore the following design choices: 1) fewer dependent messages versus more independent messages, 2) fewer fused GPU kernels versus more simple kernels, 3) number of GPU streams, 4) size of GPU thread blocks, and 5) linear versus blocked ordering of MPI ranks. We present weak-scaling performance of a latency-sensitive "small"per-rank domain and of a bandwidth-sensitive "large"per-rank domain, and we compare results for two high-performance computers with contrasting CPU, GPU, and interconnect architectures: Summit and Frontier. We find that using more independent messages tends to give better performance than using few dependent messages. We identify performance-portability recommendations for GPU streams and synchronization, but other aspects of performance show complicated dependence on problem size and computer.

Original languageEnglish
Title of host publicationProceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PublisherAssociation for Computing Machinery
Pages1070-1080
Number of pages11
ISBN (Electronic)9798400707858
DOIs
StatePublished - Nov 12 2023
Externally publishedYes
Event2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States
Duration: Nov 12 2023Nov 17 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Country/TerritoryUnited States
CityDenver
Period11/12/2311/17/23

Keywords

  • Design Trade-Offs
  • GPU Kernels
  • GPU Streams
  • GPU Thread Blocks
  • GPU-Aware MPI
  • GPUs
  • Kernel Fusion
  • MPI
  • Nearest-Neighbor Communication
  • Overlap
  • Performance Portability
  • Pipelining
  • Programming

Fingerprint

Dive into the research topics of 'Performance Portability of Programming Strategies for Nearest-Neighbor Communication with GPU-Aware MPI'. Together they form a unique fingerprint.

Cite this