TY - GEN
T1 - The Challenge of Disproportionate Importance of Temporal Features in Predicting HPC Power Consumption
AU - Li, Chengcheng
AU - Karimi, Ahmad M.
AU - Shin, Woong
AU - Qi, Hairong
AU - Wang, Feiyi
N1 - Publisher Copyright:
©2021 IEEE.
PY - 2021
Y1 - 2021
N2 - In this work, we demonstrate the challenges in predicting HPC cluster power consumption in the face of significant temporal skew in power consumption behavioral patterns. Predicting large power swings that extend several megawatts has significant operational value for HPC centers, however, prediction is challenging due to the relative rarity of such events and also due to the abrupt or disjoint deviation from the average power consumption levels. To study the impact of this challenge, we have trained a recurrent neural network (RNN) as a reasonably sophisticated model to predict power consumption of the oneyear worth of node power consumption data from the Summit supercomputer located in the Oak Ridge Leadership Computing Facility. By studying the prediction results, we have found that although simple usage of RNN models can provide good results on average power consumption levels, it would fail at predicting the power swings that have more operational value. With such results, we discuss potential next steps in addressing such issues aiming towards a robust usage of power prediction techniques in HPC operations.
AB - In this work, we demonstrate the challenges in predicting HPC cluster power consumption in the face of significant temporal skew in power consumption behavioral patterns. Predicting large power swings that extend several megawatts has significant operational value for HPC centers, however, prediction is challenging due to the relative rarity of such events and also due to the abrupt or disjoint deviation from the average power consumption levels. To study the impact of this challenge, we have trained a recurrent neural network (RNN) as a reasonably sophisticated model to predict power consumption of the oneyear worth of node power consumption data from the Summit supercomputer located in the Oak Ridge Leadership Computing Facility. By studying the prediction results, we have found that although simple usage of RNN models can provide good results on average power consumption levels, it would fail at predicting the power swings that have more operational value. With such results, we discuss potential next steps in addressing such issues aiming towards a robust usage of power prediction techniques in HPC operations.
KW - HPC
KW - Machine learning
KW - Power consumption
KW - Time-series prediction
UR - http://www.scopus.com/inward/record.url?scp=85125997166&partnerID=8YFLogxK
U2 - 10.1109/Cluster48925.2021.00094
DO - 10.1109/Cluster48925.2021.00094
M3 - Conference contribution
AN - SCOPUS:85125997166
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 632
EP - 636
BT - Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
Y2 - 7 September 2021 through 10 September 2021
ER -