Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU Clusters

Piyush Sao, Hao Lu, Ramakrishnan Kannan, Vijay Thakkar, Richard Vuduc, Thomas Potok

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We present an optimized Floyd-Warshall (Floyd-Warshall) algorithm that computes the All-pairs shortest path (APSP) for GPU accelerated clusters. The Floyd-Warshall algorithm due to its structural similarities to matrix-multiplication is well suited for highly parallel GPU architectures. To achieve high parallel efficiency, we address two key algorithmic challenges: reducing high communication overhead and addressing limited GPU memory. To reduce high communication costs, we redesign the parallel (a) to expose more parallelism, (b) aggressively overlap communication and computation with pipelined and asynchronous scheduling of operations, and (c) tailored MPI-collective. To cope with limited GPU memory, we employ an offload model, where the data resides on the host and is transferred to GPU on-demand. The proposed optimizations are supported with detailed performance models for tuning. Our optimized parallel Floyd-Warshall implementation is up to 5x faster than a strong baseline and achieves 8.1 PetaFLOPS/sec on 256∼nodes of the Summit supercomputer at Oak Ridge National Laboratory. This performance represents 70% of the theoretical peak and 80% parallel efficiency. The offload algorithm can handle 2.5x larger graphs with a 20% increase in overall running time.

Original languageEnglish
Title of host publicationHPDC 2021 - Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages121-131
Number of pages11
ISBN (Electronic)9781450382175
DOIs
StatePublished - Jun 21 2021
Event30th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2021 - Virtual, Online, Sweden
Duration: Jun 21 2021Jun 25 2021

Publication series

NameHPDC 2021 - Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference30th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2021
Country/TerritorySweden
CityVirtual, Online
Period06/21/2106/25/21

Funding

∗This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). Corresponding author: Piyush Sao, email: [email protected] ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only. HPDC ’21, June 21–25, 2021, Virtual Event, Sweden © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8217-5/21/06...$15.00 https://doi.org/10.1145/3431379.3460651 which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Robinson Pino, program manager, under contract number DE-AC05-00OR22725, as well as by the National Science Foundation under Grant Nos. 1533768 and 1710371. This research used resources of the Oak Ridge Leadership Computing Facility,

Keywords

  • all-pair shortest path
  • distributed algorithm
  • floyd-warshall algorithm
  • gpu computing
  • semi-ring algebra

Fingerprint

Dive into the research topics of 'Scalable All-pairs Shortest Paths for Huge Graphs on Multi-GPU Clusters'. Together they form a unique fingerprint.

Cite this