Quantifying scheduling challenges for exascale system software

Oscar H. Mondragon, Patrick G. Bridges, Terry Jones

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

The move towards high-performance computing (HPC) applications comprised of coupled codes and the need to dramatically reduce data movement is leading to a reexamination of time-sharing vs. space-sharing in HPC systems. In this paper, we discuss and begin to quantify the performance impact of a move away from strict space-sharing of nodes for HPC applications. Specifically, we examine the potential performance cost of time-sharing nodes between application components, we determine whether a simple coordinated scheduling mechanism can address these problems, and we research how suitable simple constraint-based optimization techniques are for solving scheduling challenges in this regime. Our results demonstrate that current generalpurpose HPC system software scheduling and resource allocation systems are subject to significant performance deficiencies which we quantify for six representative applications. Based on these results, we discuss areas in which additional research is needed to meet the scheduling challenges of next-generation HPC systems.

Original languageEnglish
Title of host publicationProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2015 - In conjunction with HPDC 2015
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450336062
DOIs
StatePublished - Jun 16 2015
Event5th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2015 - Portland, United States
Duration: Jun 16 2015 → …

Publication series

NameProceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2015 - In conjunction with HPDC 2015

Conference

Conference5th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2015
Country/TerritoryUnited States
CityPortland
Period06/16/15 → …

Funding

This work was supported in part by the 2013 Exascale Op-erating and Runtime Systems Program from the DOE Of-_ce of Science, Advanced Scienti_c Computing Research, under award number DE-SC0005050, program manager So-nia Sachs, and by the Colciencias-Fulbright Colombia and The Universidad Autonoma de Occidente through the Cal-das scholarships program.

Keywords

  • Performance
  • Scheduling
  • Time-sharing

Fingerprint

Dive into the research topics of 'Quantifying scheduling challenges for exascale system software'. Together they form a unique fingerprint.

Cite this