A Comprehensive Informative Metric for Summarizing HPC System Status

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

It remains a major challenge to effectively summarize and visualize in a comprehensive form the status of a complex computer system, such as the Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF). In the ongoing research highlighted in this poster, we present system information entropy (SIE), a newly developed system metric that leverages the powers of traditional machine learning techniques and information theory. By compressing the multi-variant multi-dimensional event information recorded during the operation of the targeted system into a single time series of SIE, we demonstrate that the historical system status can be sensitively summarized in form of SIE and visualized concisely and comprehensively.

Original languageEnglish
Title of host publication2018 IEEE 8th Symposium on Large Data Analysis and Visualization, LDAV 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages102-103
Number of pages2
ISBN (Electronic)9781538668733
DOIs
StatePublished - Oct 2018
Event8th IEEE Symposium on Large Data Analysis and Visualization, LDAV 2018 - Berlin, Germany
Duration: Oct 21 2018 → …

Publication series

Name2018 IEEE 8th Symposium on Large Data Analysis and Visualization, LDAV 2018

Conference

Conference8th IEEE Symposium on Large Data Analysis and Visualization, LDAV 2018
Country/TerritoryGermany
CityBerlin
Period10/21/18 → …

Funding

This work was sponsored by the U.S. Department of Energy's Office of Advanced Scientific Computing Research, program manager Dr. Lucy Nowell. This work was also supported by the Compute and Data Environment for Science (CADES) facility and the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy.

FundersFunder number
Compute and Data Environment for Science
U.S. Department of EnergyDE-AC05-00OR22725
Advanced Scientific Computing Research
Oak Ridge National Laboratory

    Keywords

    • General and reference-Metrics
    • Human-centered computing-Visual analytics
    • Mathematics of computing-Time series analysis

    Fingerprint

    Dive into the research topics of 'A Comprehensive Informative Metric for Summarizing HPC System Status'. Together they form a unique fingerprint.

    Cite this