IRIS-MASH: Efficient Multi-device Asynchronous Multi-Stream Heterogeneous Computing

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the rapidly evolving field of high-performance computing (HPC), effectively leveraging heterogeneous devices through asynchronous task programming is paramount. This paper presents a robust asynchronous task programming model tailored for a multi-device, multi-stream execution environment that incorporates a diverse array of heterogeneous computing units, including GPUs from various vendors and other accelerators. Current state-of-the-art task programming models provide methodologies to support asynchronous task executions, but they typically handle homogeneous devices using native programming languages, while support for heterogeneous devices is limited to frameworks like OpenCL. This gap presents significant challenges in abstracting heterogeneous devices to harness their true asynchronous capabilities effectively using their native programming languages. By implementing asynchronous task execution, our model significantly boosts the performance of tiled algorithm task graphs through overlapping data transfers with computation and enabling the simultaneous execution of multiple kernels. We integrate this approach into a heterogeneous Intelligent Runtime System (IRIS) and assess its performance using a suite of tiled algorithm benchmarks from the heterogeneous math kernels library (MatRIS) based on IRIS. Experimental results demonstrate a performance improvement ranging from 1.6 × to 2 × over IRIS without asynchronous support, and a notable 22% performance enhancement compared to established runtime systems such as StarPU and PaRSEC. This approach significantly improves computation efficiency of HPC workflows and provides a solid base for future exploration and development in the area of asynchronous task programming in heterogeneous systems.

Original languageEnglish
Title of host publication54th International Conference on Parallel Processing, ICPP 2025 - Main Conference Proceedings
PublisherAssociation for Computing Machinery, Inc
Pages764-773
Number of pages10
ISBN (Electronic)9798400720741
DOIs
StatePublished - Dec 20 2025
Event54th International Conference on Parallel Processing, ICPP 2025 - San Diego, United States
Duration: Sep 8 2025Sep 11 2025

Publication series

Name54th International Conference on Parallel Processing, ICPP 2025 - Main Conference Proceedings

Conference

Conference54th International Conference on Parallel Processing, ICPP 2025
Country/TerritoryUnited States
CitySan Diego
Period09/8/2509/11/25

Funding

This research used resources of the Oak Ridge Leadership Computing Facility and the Experimental Computing Laboratory (ExCL) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the US Department of Energy under Contract No. DE-AC05-00OR22725. This research was funded in part by the DOE ASCR MAGNET:MAthematics, ComputinG, and NETworking for Resource-Efficient Computational Science project in the DOE ASCR Office. This manuscript has been authored by UTBattelle LLC under contract DE-AC05-00OR22725 with the US Department of Energy (DOE).

Keywords

  • Asynchronous
  • CUDA
  • HIP
  • HPC
  • Heterogeneous runtime
  • IRIS
  • OpenCL
  • OpenMP
  • heterogeneous computing
  • multi-device
  • multi-stream
  • task scheduling

Fingerprint

Dive into the research topics of 'IRIS-MASH: Efficient Multi-device Asynchronous Multi-Stream Heterogeneous Computing'. Together they form a unique fingerprint.

Cite this