Abstract
The growing demand for more powerful high-performance computing (HPC) systems has led to a steady rise in energy consumption by supercomputing worldwide. This study is focused on comparing our Application-Topology Mapper (ATMapper) to the popular Simple Linux Utility for Resource Management (SLURM) for the purpose of exploring methods that can further optimize job-scheduling within HPC systems. ATMapper is an Artificial-Intelligence based approach to job-scheduling that is currently being enhanced with quantum annealing (QA) to generate optimal schedules faster. We are applying QA to speedup our ATMapper process to achieve higher computing efficiency, thereby reducing HPC energy consumption. Here, we examine how four job-scheduling approaches perform in processor node assignment when using an example network architecture of 4 interconnected nodes. Using a specialized script, we are assessing the schedule of a computation flow with 11 interdependent tasks. The data movements among nodes were tracked to count for the number of interactions (network hops) between nodes needed to complete the tasks. The total number of hops and the job completion time were then used to quantify the efficiency of the different mapping approaches. In addition to SLURM, we also compare our ATMapper to the QA-enabled LBNL TIGER and the D-Wave Distributed Computing processor assignment approaches. The preliminary results showed that our topology-aware, latency-adaptive ATMapper is significantly more efficient when compared to the other scheduling approaches due to its load-imbalance network allocation. The scheduler displayed a computing efficiency of 53% by performing significantly fewer network hops than its alternatives. By reducing the number of hops, ATMapper was able to perform all 11 tasks by using only 3 nodes out of given 4. This research indicates the potential to use QA/AI for HPC job-scheduling. Later, we will test a SLURM simulator program to draw further comparisons on the effectiveness of ATMapper's scheduling approach. The results of this comparison will serve as a baseline for later improving SLURM's performance using a QA-enhanced ATMapper approach.
Original language | English |
---|---|
Title of host publication | Workshops Program, Posters Program, Panels Program and Tutorials Program |
Editors | Candace Culhane, Greg T. Byrd, Hausi Muller, Yuri Alexeev, Yuri Alexeev, Sarah Sheldon |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 614-615 |
Number of pages | 2 |
ISBN (Electronic) | 9798331541378 |
DOIs | |
State | Published - 2024 |
Event | 5th IEEE International Conference on Quantum Computing and Engineering, QCE 2024 - Montreal, Canada Duration: Sep 15 2024 → Sep 20 2024 |
Publication series
Name | Proceedings - IEEE Quantum Week 2024, QCE 2024 |
---|---|
Volume | 2 |
Conference
Conference | 5th IEEE International Conference on Quantum Computing and Engineering, QCE 2024 |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 09/15/24 → 09/20/24 |
Funding
This research under DoE WDTS VFP support, used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Lab, supported by USA DoE Office of Science under Contract No. DE-AC05-00OR22725.
Keywords
- HPC resource allocation optimization
- application-specific topology-aware HPC scheduling
- latency-adaptive parallel SW/HW mapping
- load imbalance
- quantum annealing (QA)