IRIS-GNN: Leveraging Graph Neural Networks for Scheduling on Truly Heterogeneous Runtime Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The diversity of accelerators in computer systems poses significant challenges for software developers, such as managing vendor-specific compiler toolchains, code fragmentation requiring different kernel implementations, and performance portability issues. To address these, the Intelligent Runtime System (IRIS) was developed. IRIS works across various systems, from smartphones to supercomputers, enabling automatic performance scaling based on available accelerators. It introduces abstract tasks for seamless execution transitions between accelerators while ensuring memory consistency and task dependencies. Although IRIS simplifies system details, optimal dynamic scheduling still requires user input to understand workload structures. To address this, we introduce a new scheduling policy for IRIS, termed IRIS-GNN, which is the first IRIS hybrid policy that operates in conjunction with the dynamic policies. This policy employs a Graph-Neural Network (GNN) to conduct Graph Classification of any task graphs submitted to IRIS. This GNN analyzes the structure and attributes of the task graph, categorizing it as either locality, concurrency, or mixed. This classification subsequently guides the selection of the dynamic policy used by IRIS. We provide a comparison of the performance of IRIS-GNN against the complete spectrum of IRIS's dynamic policies, assess the overhead introduced by the GNN within this scheduling framework, and ultimately explore its practical application in real-world scenarios.

Original languageEnglish
Title of host publicationProceedings of SC 2024-W
Subtitle of host publicationWorkshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1071-1080
Number of pages10
ISBN (Electronic)9798350355543
DOIs
StatePublished - 2024
Event2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024 - Atlanta, United States
Duration: Nov 17 2024Nov 22 2024

Publication series

NameProceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Country/TerritoryUnited States
CityAtlanta
Period11/17/2411/22/24

Funding

This research used resources of the Experimental Computing Laboratory (ExCL) and the Oak Ridge Leadership Computing Facility (OLCF) at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was supported by the following sources: 1) Defense Advanced Research Projects Agency (DARPA) Microsystems Technology Office (MTO) Domain-Specific System-on-Chip Program and 2) U.S. Department of Defense Advanced Computing Initiative (ACI), Brisbane: Productive Programming Systems in the Era of Extremely Heterogeneous and Ephemeral Computer Architectures. This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DEAC05- 00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.

Keywords

  • accelerators
  • high-performance computing
  • runtime systems
  • scheduling

Fingerprint

Dive into the research topics of 'IRIS-GNN: Leveraging Graph Neural Networks for Scheduling on Truly Heterogeneous Runtime Systems'. Together they form a unique fingerprint.

Cite this