Abstract
Job scheduling at supercomputing facilities is important for achieving high utilization of these valuable resources while ensuring effective execution of jobs submitted by users. The jobs are scheduled according to their specified resource demands such as expected job completion times, and the available resources based on allocations. Jobs that overrun their allocated times are terminated, for example, after a grace-period. It is non-trivial and often very complex for users to accurately estimate the completion times of their jobs, and consequently they face a dilemma: underestimate the job time to have a higher priority and risk job termination due to overrun, or overestimate it to ensure its completion and risk its delayed execution. In this paper, we investigate whether providing grace-period can benefit facility performance by developing a game- theoretic model between a facility provider and multiple users for a simplified scheduling scenario based on job execution times. We present closed-form expressions for the provider's and user's best-response strategies to maximize their respective utility functions. We describe conditions under which offering a grace-period is advantageous to both facility provider and users by deriving the Nash equilibrium of the game.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2021 IEEE 24th International Conference on Information Fusion, FUSION 2021 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9781737749714 |
| DOIs | |
| State | Published - 2021 |
| Event | 24th IEEE International Conference on Information Fusion, FUSION 2021 - Sun City, South Africa Duration: Nov 1 2021 → Nov 4 2021 |
Publication series
| Name | Proceedings of 2021 IEEE 24th International Conference on Information Fusion, FUSION 2021 |
|---|
Conference
| Conference | 24th IEEE International Conference on Information Fusion, FUSION 2021 |
|---|---|
| Country/Territory | South Africa |
| City | Sun City |
| Period | 11/1/21 → 11/4/21 |
Funding
This work is funded by the RAMSES project, Office of Advanced Computing Research, U.S. Department of Energy, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC under Contract No. DEAC05-00OR22725; and partially supported by grant UGC/FDS14/E01/19 from the Research Grants Council of the Hong Kong Special Administrative Region.
Keywords
- Game theory
- Grace-period
- Job completion times
- Supercomputers
- Under- and over-requested time