TY - GEN
T1 - Power-capping aware checkpointing
T2 - 46th IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016
AU - Tang, Kun
AU - Tiwari, Devesh
AU - Gupta, Saurabh
AU - Huang, Ping
AU - Lu, Qiqi
AU - Engelmann, Christian
AU - He, Xubin
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/9/29
Y1 - 2016/9/29
N2 - Checkpoint and restart mechanisms have been widely used in large scientific simulation applications to make forward progress in case of failures. However, none of the prior works have considered the interaction of power-constraint with temperature, reliability, performance, and checkpointing interval. It is not clear how power-capping may affect optimal checkpointing interval. What are the involved reliability, performance, and energy trade-offs? In this paper, we develop a deep understanding about the interaction between power-capping and scientific applications using checkpoint/restart as resilience mechanism, and propose a new model for the optimal checkpointing interval (OCI) under power-capping. Our study reveals several interesting, and previously unknown, insights about how power-capping affects the reliability, energy consumption, performance.
AB - Checkpoint and restart mechanisms have been widely used in large scientific simulation applications to make forward progress in case of failures. However, none of the prior works have considered the interaction of power-constraint with temperature, reliability, performance, and checkpointing interval. It is not clear how power-capping may affect optimal checkpointing interval. What are the involved reliability, performance, and energy trade-offs? In this paper, we develop a deep understanding about the interaction between power-capping and scientific applications using checkpoint/restart as resilience mechanism, and propose a new model for the optimal checkpointing interval (OCI) under power-capping. Our study reveals several interesting, and previously unknown, insights about how power-capping affects the reliability, energy consumption, performance.
UR - http://www.scopus.com/inward/record.url?scp=84994248942&partnerID=8YFLogxK
U2 - 10.1109/DSN.2016.36
DO - 10.1109/DSN.2016.36
M3 - Conference contribution
AN - SCOPUS:84994248942
T3 - Proceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016
SP - 311
EP - 322
BT - Proceedings - 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 28 June 2016 through 1 July 2016
ER -