Optimizing Communication in 2D Grid-Based MPI Applications at Exascale

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The new reality of exascale computing faces many challenges in achieving optimal performance on large numbers of nodes. A key challenge is the efficient utilization of the message-passing interface (MPI), a critical component for process communication. This paper explores communication optimization strategies to harness the GPU-Accelerated architectures of these supercomputers. We focus on MPI applications where processors form a two-dimensional process grid, a common arrangement in applications involving dense matrix operations. This configuration offers a unique opportunity to implement innovative strategies to improve performance and maintain effective load distribution. We study two applications-Dist-FW (Apsp:all-pair-shortest-path) and HPL-MxP (LU factorization with Mixed precision)-on two accelerated systems: Summit (IBM Power and NVIDIA V100) and Frontier (AMD EPYC and MI250X). These supercomputers are operated by the Oak Ridge Leadership Computing Facility (OLCF) and are currently ranked #1 and #5 on the Top500 list. We show how to scale up both applications to exascale levels and tackle the MPI challenges related to implementation, synchronization, and performance. We also compare the performance of several communication strategies at an unprecedented scale. Accurately predicting application performance becomes crucial for cost reduction as the computational scale grows. To address this, we suggest a hyperbolic model as a better alternative to the traditional one-sided asymptotic model for predicting future application performance at such large scales.

Original languageEnglish
Title of host publicationProceedings of the 30th European MPI Users'' Group Meeting, EuroMPI 2023
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400709135
DOIs
StatePublished - Nov 19 2023
Event30th European MPI Users'' Group Meeting, EuroMPI 2023 - Bristol, United Kingdom
Duration: Sep 11 2023Sep 13 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference30th European MPI Users'' Group Meeting, EuroMPI 2023
Country/TerritoryUnited Kingdom
CityBristol
Period09/11/2309/13/23

Funding

This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE)..

FundersFunder number
U.S. Department of Energy

    Fingerprint

    Dive into the research topics of 'Optimizing Communication in 2D Grid-Based MPI Applications at Exascale'. Together they form a unique fingerprint.

    Cite this