Skip to main navigation Skip to search Skip to main content

Dual Channel Dual Staging: Hierarchical and Portable Staging for GPU-Based In-Situ Workflow

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In-situ workflows have emerged as an attractive approach for addressing data movement challenges at very large scales. Since GPU-based architectures dominate the HPC landscapes, porting these in-situ workflows, and, specifically, the inter-application data exchange, to GPU-based systems can be challenging. Technologies such as GPUDirect RDMA (GDR), which is typically used for I/O in GPU applications as an optimization that circumvents the CPU overhead, can be leveraged to support bulk data exchanges between GPU applications. However, current GDR design often lacks performance portability across HPC clusters built with different hardware configurations. Furthermore, the local CPU may also be effectively used as an auxiliary communication mechanism to offload data exchanges. In this paper, we present a dual channel dual staging approach for efficient, scalable, and performance-portable inter-application data exchange for in-situ workflows. This approach exploits the data access pattern within in-situ workflows along with the inherent execution asynchrony to accelerate data exchanges and, at the same time, improve performance portability. Specifically, the dual channel dual staging method leverages both the local CPU and the remote data staging server to build a hierarchical joint staging area and uses this staging area to transform blocking inter-application bulk data exchanges into best-effort local data movements between GPU and CPU. The dual channel dual staging is implemented as a portability extension of the Dataspaces-GPU staging framework. We present an experimental evaluation of its performance, portability, and scalability using this implementation on three leadership GPU clusters. The evaluation results demonstrate that the dual channel dual staging method saves up to 75% in data-exchange time compared to host-based, GDR, and alternate portable designs, while maintaining scalability (up to 512 GPUs) and performance portability across the three platforms.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics, HiPC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages188-198
Number of pages11
ISBN (Electronic)9798331509095
DOIs
StatePublished - 2024
Event31st Annual IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2024 - Bangalore, India
Duration: Dec 18 2024Dec 21 2024

Publication series

NameProceedings - 2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics, HiPC 2024

Conference

Conference31st Annual IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2024
Country/TerritoryIndia
CityBangalore
Period12/18/2412/21/24

Funding

This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC, visualization, database, or grid resources that have contributed to the research results reported within this paper. URL: http://www.tacc.utexas.edu. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). This work is also based upon work by the RAPIDS2 Institute supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research through the Advanced Computing (Sci-DAC) program under Award Number DE-SC0023130. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Keywords

  • Data Staging
  • Extreme-Scale Data Management
  • GPUDirect RDMA
  • High Performance Computing
  • In-Situ Workflow

Fingerprint

Dive into the research topics of 'Dual Channel Dual Staging: Hierarchical and Portable Staging for GPU-Based In-Situ Workflow'. Together they form a unique fingerprint.

Cite this