Harmony: A harness monitoring system for the oak Ridge leadership computing facility

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Acceptance of a new system requires extensive testing and is often comprised of hundreds of tests. Summit, the latest flagship supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), and the number one system in the November 2018 Top500 list [2], completed its acceptance testing in 2018. To execute acceptance, the acceptance test (AT) team utilizes the OLCF test harness, a tool developed at the OLCF that automates the launch and verification of all acceptance tests. Acceptance requires analysis of test results and classification of all test failures. The sheer number of tests involved makes performing these tasks challenging. To complete these tasks more efficiently, in addition to lessen the personnel burden during acceptance testing, we developed a monitoring system for the OLCF test harness called Harmony.

Original languageEnglish
Title of host publicationProceedings of the Practice and Experience in Advanced Research Computing
Subtitle of host publicationRise of the Machines (Learning), PEARC 2019
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450372275
DOIs
StatePublished - Jul 28 2019
Event2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States
Duration: Jul 28 2019Aug 1 2019

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019
Country/TerritoryUnited States
CityChicago
Period07/28/1908/1/19

Funding

We would like to thank Don Maxwell, Jason Kincl, Arnold Tharrington, and Wayne Joubert for their contributions to this project. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was supported in part by an appointment to the Oak Ridge National Laboratory Oak Ridge Science Semester Program sponsored by the U.S. Department of Energy and administered by the Oak Ridge Institute for Science and Education.

Keywords

  • High performance computing
  • Large-scale system testing

Fingerprint

Dive into the research topics of 'Harmony: A harness monitoring system for the oak Ridge leadership computing facility'. Together they form a unique fingerprint.

Cite this