GPU-aware non-contiguous data movement in open MPI

Wei Wu, George Bosilca, Rolf VandeVaart, Sylvain Jeaugey, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

Due to better parallel density and power efficiency, GPUs have become more popular for use in scientific applications. Many of these applications are based on the ubiquitous Message Passing Interface (MPI) programming paradigm, and take advantage of non-contiguous memory layouts to exchange data between processes. However, support for efficient non-contiguous data movements for GPU-resident data is still in its infancy, imposing a negative impact on the over-all application performance. To address this shortcoming, we present a solution where we take advantage of the inherent parallelism in the datatype packing and unpacking operations. We developed a close integration between Open MPI's stack-based datatype engine, NVIDIA's Unified Memory Architecture and GPUDirect capabilities. In this design the datatype packing and unpacking operations are offloaded onto the GPU and handled by specialized GPU kernels, while the CPU remains the driver for data movements between nodes. By incorporating our design into the Open MPI library we have shown significantly better performance for non-contiguous GPU-resident data transfers on both shared and distributed memory machines.

Original languageEnglish
Title of host publicationHPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages231-242
Number of pages12
ISBN (Electronic)9781450343145
DOIs
StatePublished - May 31 2016
Event25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016 - Kyoto, Japan
Duration: May 31 2016Jun 4 2016

Publication series

NameHPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016
Country/TerritoryJapan
CityKyoto
Period05/31/1606/4/16

Keywords

  • Datatype
  • GPU
  • Hybrid architecture
  • MPI
  • Non-contiguous data

Fingerprint

Dive into the research topics of 'GPU-aware non-contiguous data movement in open MPI'. Together they form a unique fingerprint.

Cite this