Reservation and Checkpointing Strategies for Stochastic Jobs

Ana Gainaru, Brice Goglin, Valentin Honore, Guillaume Pallez Aupy, Padma Raghavan, Yves Robert, Hongyang Sun

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this paper, we are interested in scheduling and checkpointing stochastic jobs on a reservation-based platform, whose cost depends both (i) on the reservation made, and (ii) on the actual execution time of the job. Stochastic jobs are jobs whose execution time cannot be determined easily. They arise from the heterogeneous, dynamic and data-intensive requirements of new emerging fields such as neuroscience. In this study, we assume that jobs can be interrupted at any time to take a checkpoint, and that job execution times follow a known probability distribution. Based on past experience, the user has to determine a sequence of fixed-length reservation requests, and to decide whether the state of the execution should be checkpointed at the end of each request. The objective is to minimize the expected cost of a successful execution of the jobs. We provide an optimal strategy for discrete probability distributions of job execution times, and we design fully polynomial-time approximation strategies for continuous distributions with bounded support. These strategies are then experimentally evaluated and compared to standard approaches such as periodic-length reservations and simple checkpointing strategies (either checkpoint all reservations, or none). The impact of an imprecise knowledge of checkpoint and restart costs is also assessed experimentally.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages853-863
Number of pages11
ISBN (Electronic)9781728168760
DOIs
StatePublished - May 2020
Externally publishedYes
Event34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020 - New Orleans, United States
Duration: May 18 2020May 22 2020

Publication series

NameProceedings - 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020

Conference

Conference34th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2020
Country/TerritoryUnited States
CityNew Orleans
Period05/18/2005/22/20

Funding

Acknowledgments: We thank the anonymous reviewers for their comments and suggestions. This research was supported in part by the Vanderbilt Institutional Fund. Some of the simulations presented in this paper were carried out using the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and Conseil Régional d’Aquitaine (see https://www.plafrim.fr/ en/home/). The remaining simulation resources were provided by the computing facilities MCIA (Mésocentre de Calcul Intensif Aquitain) of the Université de Bordeaux and of the Université de Pau et des Pays de l’Adour.

FundersFunder number
Vanderbilt Institutional Fund

    Keywords

    • checkpointing
    • neuroscience application
    • reservation sequence
    • reservation-based platform
    • scheduling
    • stochastic job

    Fingerprint

    Dive into the research topics of 'Reservation and Checkpointing Strategies for Stochastic Jobs'. Together they form a unique fingerprint.

    Cite this