Abstract
In this work, we demonstrate the challenges in predicting HPC cluster power consumption in the face of significant temporal skew in power consumption behavioral patterns. Predicting large power swings that extend several megawatts has significant operational value for HPC centers, however, prediction is challenging due to the relative rarity of such events and also due to the abrupt or disjoint deviation from the average power consumption levels. To study the impact of this challenge, we have trained a recurrent neural network (RNN) as a reasonably sophisticated model to predict power consumption of the oneyear worth of node power consumption data from the Summit supercomputer located in the Oak Ridge Leadership Computing Facility. By studying the prediction results, we have found that although simple usage of RNN models can provide good results on average power consumption levels, it would fail at predicting the power swings that have more operational value. With such results, we discuss potential next steps in addressing such issues aiming towards a robust usage of power prediction techniques in HPC operations.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 632-636 |
Number of pages | 5 |
ISBN (Electronic) | 9781728196664 |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States Duration: Sep 7 2021 → Sep 10 2021 |
Publication series
Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
---|---|
Volume | 2021-September |
ISSN (Print) | 1552-5244 |
Conference
Conference | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Portland |
Period | 09/7/21 → 09/10/21 |
Funding
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This work was supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725).
Keywords
- HPC
- Machine learning
- Power consumption
- Time-series prediction