Preemptive resource management for dynamically arriving tasks in an oversubscribed heterogeneous computing system

Dylan Machovec, Sudeep Pasricha, Anthony A. Maciejewski, Howard Jay Siegel, Gregory A. Koenig, Michael Wright, Marcia Hilton, Rajendra Rambharos, Thomas Naughton, Neena Imam

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We design resource management heuristics that assign serial tasks to the nodes of a heterogeneous high performance computing (HPC) system. The value of completing these tasks is modeled using monotonically decreasing utility functions that represent the time-varying importance of the task. The value of completing a task is equal to its utility function at the time of its completion. The overall performance of this system is measured using the total utility earned by all tasks during some interval of time. To maximize the performance of such a system where the preemption of tasks is possible, we have designed, analyzed, and compared a set of resource allocation heuristic techniques. We combine two utility-aware heuristics with three different preemption techniques to create six preemption-capable heuristics. We also consider the two utility-aware heuristics without preemption. We use simulation studies to evaluate this set of eight heuristics and compare them with an FCFS heuristic, which is often used in real systems, and random assignments. In general, our set of eight heuristics is able to significantly outperform the comparison heuristics, and the preemption-capable heuristics are able to significantly increase the utility earned compared to the heuristics that do not use preemption. We analyze the performance tradeoffs among the different preemption-capable heuristics under a variety of oversubscribed workload environments.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages54-64
Number of pages11
ISBN (Electronic)9781538634080
DOIs
StatePublished - Jun 30 2017
Event31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017 - Orlando, United States
Duration: May 29 2017Jun 2 2017

Publication series

NameProceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017

Conference

Conference31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017
Country/TerritoryUnited States
CityOrlando
Period05/29/1706/2/17

Funding

This manuscript has been administered by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-accessplan). This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory (ORNL), supported by the Extreme Scale Systems Center at ORNL, which is supported by the Department of Defense (DoD); and by NSF Grant CCF-1302693. This work also utilized CSU’s ISTeC Cray system, which is supported by the National Science Foundation (NSF) under grant number CNS-0923386.

FundersFunder number
McDonnell Center for Systems Neuroscience
National Science FoundationCNS-0923386
U.S. Department of Energy
National Center for Theoretical Sciences
Oak Ridge National Laboratory
U.S. Department of Energy
U.S. Department of Defense
CelgardDE-AC05-00OR22725
New York Public Library
CNS-0923386
Cleveland State University
Oak Ridge National Laboratory
CCF-1302693
National Stroke FoundationCCF-1302693

    Keywords

    • heterogeneous computing
    • preemption
    • resource management heuristics
    • scheduling
    • utility functions

    Fingerprint

    Dive into the research topics of 'Preemptive resource management for dynamically arriving tasks in an oversubscribed heterogeneous computing system'. Together they form a unique fingerprint.

    Cite this