Abstract
Acceptance of a new system requires extensive testing and is often comprised of hundreds of tests. Summit, the latest flagship supercomputer at the Oak Ridge Leadership Computing Facility (OLCF), and the number one system in the November 2018 Top500 list [2], completed its acceptance testing in 2018. To execute acceptance, the acceptance test (AT) team utilizes the OLCF test harness, a tool developed at the OLCF that automates the launch and verification of all acceptance tests. Acceptance requires analysis of test results and classification of all test failures. The sheer number of tests involved makes performing these tasks challenging. To complete these tasks more efficiently, in addition to lessen the personnel burden during acceptance testing, we developed a monitoring system for the OLCF test harness called Harmony.
Original language | English |
---|---|
Title of host publication | Proceedings of the Practice and Experience in Advanced Research Computing |
Subtitle of host publication | Rise of the Machines (Learning), PEARC 2019 |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9781450372275 |
DOIs | |
State | Published - Jul 28 2019 |
Event | 2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 - Chicago, United States Duration: Jul 28 2019 → Aug 1 2019 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 2019 Conference on Practice and Experience in Advanced Research Computing: Rise of the Machines (Learning), PEARC 2019 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 07/28/19 → 08/1/19 |
Funding
We would like to thank Don Maxwell, Jason Kincl, Arnold Tharrington, and Wayne Joubert for their contributions to this project. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was supported in part by an appointment to the Oak Ridge National Laboratory Oak Ridge Science Semester Program sponsored by the U.S. Department of Energy and administered by the Oak Ridge Institute for Science and Education.
Keywords
- High performance computing
- Large-scale system testing