TY - GEN
T1 - Reservation strategies for stochastic jobs
AU - Aupy, Guillaume
AU - Gainaru, Ana
AU - Honoré, Valentin
AU - Raghavan, Padma
AU - Robert, Yves
AU - Sun, Hongyang
N1 - Publisher Copyright:
© 2019 IEEE
PY - 2019/5
Y1 - 2019/5
N2 - In this paper, we are interested in scheduling stochastic jobs on a reservation-based platform. Specifically, we consider jobs whose execution time follows a known probability distribution. The platform is reservation-based, meaning that the user has to request fixed-length time slots. The cost then depends on both (i) the request duration (pay for what you ask); and (ii) the actual execution time of the job (pay for what you use). A reservation strategy determines a sequence of increasing-length reservations, which are paid for until one of them allows the job to successfully complete. The goal is to minimize the total expected cost of the strategy. We provide some properties of the optimal solution, which we characterize up to the length of the first reservation. We then design several heuristics based on various approaches, including a brute-force search of the first reservation length while relying on the characterization of the optimal strategy, as well as the discretization of the target continuous probability distribution together with an optimal dynamic programming algorithm for the discrete distribution. We evaluate these heuristics using two different platform models and cost functions: The first one targets a cloud-oriented platform (e.g., Amazon AWS) using jobs that follow a large number of usual probability distributions (e.g., Uniform, Exponential, LogNormal, Weibull, Beta), and the second one is based on interpolating traces from a real neuroscience application executed on an HPC platform. An extensive set of simulation results show the effectiveness of the proposed reservation-based approaches for scheduling stochastic jobs.
AB - In this paper, we are interested in scheduling stochastic jobs on a reservation-based platform. Specifically, we consider jobs whose execution time follows a known probability distribution. The platform is reservation-based, meaning that the user has to request fixed-length time slots. The cost then depends on both (i) the request duration (pay for what you ask); and (ii) the actual execution time of the job (pay for what you use). A reservation strategy determines a sequence of increasing-length reservations, which are paid for until one of them allows the job to successfully complete. The goal is to minimize the total expected cost of the strategy. We provide some properties of the optimal solution, which we characterize up to the length of the first reservation. We then design several heuristics based on various approaches, including a brute-force search of the first reservation length while relying on the characterization of the optimal strategy, as well as the discretization of the target continuous probability distribution together with an optimal dynamic programming algorithm for the discrete distribution. We evaluate these heuristics using two different platform models and cost functions: The first one targets a cloud-oriented platform (e.g., Amazon AWS) using jobs that follow a large number of usual probability distributions (e.g., Uniform, Exponential, LogNormal, Weibull, Beta), and the second one is based on interpolating traces from a real neuroscience application executed on an HPC platform. An extensive set of simulation results show the effectiveness of the proposed reservation-based approaches for scheduling stochastic jobs.
KW - Neuroscience applications
KW - Reservation-based platform
KW - Scheduling
KW - Sequence of requests
KW - Stochastic job
UR - http://www.scopus.com/inward/record.url?scp=85064542470&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2019.00027
DO - 10.1109/IPDPS.2019.00027
M3 - Conference contribution
AN - SCOPUS:85064542470
T3 - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019
SP - 166
EP - 175
BT - Proceedings - 2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 33rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2019
Y2 - 20 May 2019 through 24 May 2019
ER -