Speculative scheduling for stochastic HPC applications

Ana Gainaru, Guillaume Pallez Aupy, Hongyang Sun, Padma Raghavan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

New emerging fields are developing a growing number of large-scale applications with heterogeneous, dynamic and data-intensive requirements that put a high emphasis on productivity and thus are not tuned to run efficiently on today's high performance computing (HPC) systems. Some of these applications, such as neuroscience workloads and those that use adaptive numerical algorithms, develop modeling and simulation workflows with stochastic execution times and unpredictable resource requirements. When they are deployed on current HPC systems using existing resource management solutions, it can result in loss of efficiency for the users and decrease in effective system utilization for the platform providers. In this paper, we consider the current HPC scheduling model and describe the challenge it poses for stochastic applications due to the strict requirement in its job deployment policies. To address the challenge, we present speculative scheduling techniques that adapt the resource requirements of a stochastic application on-the-fly, based on its past execution behavior instead of relying on estimates given by the user. We focus on improving the overall system utilization and application response time without disrupting the current HPC scheduling model or the application development process. Our solution can operate alongside existing HPC batch schedulers without interfering with their usage modes. We show that speculative scheduling can improve the system utilization and average application response time by 25-30% compared to the classical HPC approach.

Original languageEnglish
Title of host publicationProceedings of the 48th International Conference on Parallel Processing, ICPP 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450362955
DOIs
StatePublished - Aug 5 2019
Externally publishedYes
Event48th International Conference on Parallel Processing, ICPP 2019 - Kyoto, Japan
Duration: Aug 5 2019Aug 8 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference48th International Conference on Parallel Processing, ICPP 2019
Country/TerritoryJapan
CityKyoto
Period08/5/1908/8/19

Keywords

  • HPC runtime
  • Scheduling algorithm
  • Stochastic applications

Fingerprint

Dive into the research topics of 'Speculative scheduling for stochastic HPC applications'. Together they form a unique fingerprint.

Cite this