TY - GEN
T1 - IRIS
T2 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
AU - Johnston, Beau
AU - Miniskar, Narasinga Rao
AU - Young, Aaron
AU - Monil, Mohammad Alaul Haque
AU - Lee, Seyong
AU - Vetter, Jeffrey S.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - High-Performance Computing is becoming increasingly heterogeneous, relying on a diverse mix of hardware to achieve good performance. Paradoxically, current drivers and frameworks for these devices typically require separate languages and implementations for each vendor. Furthermore, there are few tools and little support to schedule codes between these devices in a truly heterogeneous manner-partly because of this fragmentation between vendors and the languages each supports. To overcome both limitations, the Intelligent Runtime System (IRIS) was developed. It allows a common task abstraction to automatically be shared among contemporary vendors and is run from a single host-side API. At runtime, IRIS queries the host system and registers which frameworks and drivers are available, these determine which kernels can be used by the scheduler-CPUs via OpenMP, Nvidia GPUs (CUDA), AMD GPUs (HIP), and Intel and Xilinx FPGAs with OpenCL. IRIS enables tasks to be scheduled to any heterogeneous device and resolves to the appropriate kernel binary at runtimeit only uses the devices supported by the system on which it is run. IRIS supports single-task and graph-based expressions of dependencies of tasks. Additionally, IRIS features a range of dynamic scheduling policies, allowing complex chains of tasks and interactions to be executed, relieving the programmer/user from considering the system to assign tasks to devices optimally. This paper presents the peak performance attainable by IRIS over a range of systems-each with different numbers and types of accelerator devices, it highlights the flexibility of IRIS since these devices are truly heterogeneous, relying on different backends (drivers, frameworks, and languages) which historically required unique implementations to utilize them. We then use this peak performance as a baseline to compare increasingly complex chains of tasks (with increasingly complex task dependencies) and evaluate how IRIS copes. Finally, we consider the performance of different IRIS scheduling policies on this range of task graphs.
AB - High-Performance Computing is becoming increasingly heterogeneous, relying on a diverse mix of hardware to achieve good performance. Paradoxically, current drivers and frameworks for these devices typically require separate languages and implementations for each vendor. Furthermore, there are few tools and little support to schedule codes between these devices in a truly heterogeneous manner-partly because of this fragmentation between vendors and the languages each supports. To overcome both limitations, the Intelligent Runtime System (IRIS) was developed. It allows a common task abstraction to automatically be shared among contemporary vendors and is run from a single host-side API. At runtime, IRIS queries the host system and registers which frameworks and drivers are available, these determine which kernels can be used by the scheduler-CPUs via OpenMP, Nvidia GPUs (CUDA), AMD GPUs (HIP), and Intel and Xilinx FPGAs with OpenCL. IRIS enables tasks to be scheduled to any heterogeneous device and resolves to the appropriate kernel binary at runtimeit only uses the devices supported by the system on which it is run. IRIS supports single-task and graph-based expressions of dependencies of tasks. Additionally, IRIS features a range of dynamic scheduling policies, allowing complex chains of tasks and interactions to be executed, relieving the programmer/user from considering the system to assign tasks to devices optimally. This paper presents the peak performance attainable by IRIS over a range of systems-each with different numbers and types of accelerator devices, it highlights the flexibility of IRIS since these devices are truly heterogeneous, relying on different backends (drivers, frameworks, and languages) which historically required unique implementations to utilize them. We then use this peak performance as a baseline to compare increasingly complex chains of tasks (with increasingly complex task dependencies) and evaluate how IRIS copes. Finally, we consider the performance of different IRIS scheduling policies on this range of task graphs.
KW - Execution Model
KW - Heterogeneous Computing
KW - Heterogeneous Systems
KW - High-Performance Computing
KW - Performance Portability
KW - Programming Models
KW - Runtime System
KW - Scheduling Policy
KW - Task Schedule
UR - http://www.scopus.com/inward/record.url?scp=85200749927&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW63119.2024.00017
DO - 10.1109/IPDPSW63119.2024.00017
M3 - Conference contribution
AN - SCOPUS:85200749927
T3 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
SP - 58
EP - 67
BT - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 May 2024 through 31 May 2024
ER -