CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes

Mert Hidayetoglu, Simon Garcia De Gonzalo, Elliott Slaughter, Yu Li, Christopher Zimmer, Tekin Bicer, Bin Ren, William Gropp, Wen Mei Hwu, Alex Aiken

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Modern high-performance computing systems have multiple GPUs and network interface cards (NICs) per node. The resulting network architectures have multilevel hierarchies of subnetworks with different interconnect and software technologies. These systems offer multiple vendor-provided communication capabilities and library implementations (IPC, MPI, NCCL, RCCL, OneCCL) with APIs providing varying levels of performance across the different levels. Understanding this performance is currently difficult because of the wide range of architectures and programming models (CUDA, HIP, OneAPI). We present CommBench, a library with cross-system portability and a high-level API that enables developers to easily build microbenchmarks relevant to their use cases and gain insight into the performance (bandwidth & latency) of multiple implementation libraries on different networks. We demonstrate CommBench with three sets of microbenchmarks that profile the performance of six systems. Our experimental results reveal the effect of multiple NICs on optimizing the bandwidth across nodes and also present the performance characteristics of four available communication libraries within and across nodes of NVIDIA, AMD, and Intel GPU networks.

Original languageEnglish
Title of host publicationICS 2024 - Proceedings of the 38th ACM International Conference on Supercomputing
PublisherAssociation for Computing Machinery
Pages426-436
Number of pages11
ISBN (Electronic)9798400706103
DOIs
StatePublished - May 30 2024
Event38th ACM International Conference on Supercomputing, ICS 2024 - Kyoto, Japan
Duration: Jun 4 2024Jun 7 2024

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference38th ACM International Conference on Supercomputing, ICS 2024
Country/TerritoryJapan
CityKyoto
Period06/4/2406/7/24

Fingerprint

Dive into the research topics of 'CommBench: Micro-Benchmarking Hierarchical Networks with Multi-GPU, Multi-NIC Nodes'. Together they form a unique fingerprint.

Cite this