Abstract
This paper provides a broad framework for understanding trends in Operational Data Analytics (ODA) for HighPerformance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a fourpillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leadingedge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 596-603 |
Number of pages | 8 |
ISBN (Electronic) | 9781728196664 |
DOIs | |
State | Published - 2021 |
Externally published | Yes |
Event | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States Duration: Sep 7 2021 → Sep 10 2021 |
Publication series
Name | Proceedings - IEEE International Conference on Cluster Computing, ICCC |
---|---|
Volume | 2021-September |
ISSN (Print) | 1552-5244 |
Conference
Conference | 2021 IEEE International Conference on Cluster Computing, Cluster 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Portland |
Period | 09/7/21 → 09/10/21 |
Funding
Alessio Netti was supported by the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No. 956560 (REGALE). This work was supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725). Alessio Netti was supported by the European Union's Horizon 2020/EuroHPC research and innovation programme under grant agreement No. 956560 (REGALE). This work was supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725). This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- Energy efficiency
- Exascale
- HPC operations
- Operational data
- Top500