A Conceptual Framework for HPC Operational Data Analytics

Alessio Netti, Woong Shin, Michael Ott, Torsten Wilde, Natalie Bates

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Scopus citations

Abstract

This paper provides a broad framework for understanding trends in Operational Data Analytics (ODA) for HighPerformance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a fourpillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leadingedge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages596-603
Number of pages8
ISBN (Electronic)9781728196664
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States
Duration: Sep 7 2021Sep 10 2021

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2021-September
ISSN (Print)1552-5244

Conference

Conference2021 IEEE International Conference on Cluster Computing, Cluster 2021
Country/TerritoryUnited States
CityVirtual, Portland
Period09/7/2109/10/21

Funding

Alessio Netti was supported by the European Union’s Horizon 2020/EuroHPC research and innovation programme under grant agreement No. 956560 (REGALE). This work was supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725). Alessio Netti was supported by the European Union's Horizon 2020/EuroHPC research and innovation programme under grant agreement No. 956560 (REGALE). This work was supported by, and used the resources of, the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT Battelle, LLC for the U.S. DOE (under the contract No. DE-AC05-00OR22725). This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • Energy efficiency
  • Exascale
  • HPC operations
  • Operational data
  • Top500

Fingerprint

Dive into the research topics of 'A Conceptual Framework for HPC Operational Data Analytics'. Together they form a unique fingerprint.

Cite this