Predicting output performance of a petascale supercomputer

Bing Xie, Yezhou Huang, Jefrey S. Chase, Jong Youl Choi, Scott Klasky, Jay Lofstead, Sarp Oral

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

In this paper, we develop a predictive model useful for output performance prediction of supercomputer file systems under production load. Our target environment is Titan-the 3rd fastest supercomputer in the world-and its Lustre-based multi-stage write path. We observe from Titan that although output performance is highly variable at small time scales, the mean performance is stable and consistent over typical application run times. Moreover, we find that output performance is non-linearly related to its correlated parameters due to interference and saturation on individual stages on the path. These observations enable us to build a predictive model of expected write times of output patterns and I/O configurations, using feature transformations to capture non-linear relationships. We identify the candidate features based on the structure of the Lustre/Titan write path, and use feature transformation functions to produce a model space with 135,000 candidate models. By searching for the minimal mean square error in this space we identify a good model and show that it is effective.

Original languageEnglish
Title of host publicationHPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages181-192
Number of pages12
ISBN (Electronic)9781450346993
DOIs
StatePublished - Jun 26 2017
Event26th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2017 - Washington, United States
Duration: Jun 26 2017Jun 30 2017

Publication series

NameHPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference26th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2017
Country/TerritoryUnited States
CityWashington
Period06/26/1706/30/17

Funding

This work was supported by the U.S. Department of Energy, under FWP 16-018666, program manager Lucy Nowell. The work used resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at the Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC05-00OR22725. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

Keywords

  • Linear regression
  • Output performance
  • Petascale supercomputer

Fingerprint

Dive into the research topics of 'Predicting output performance of a petascale supercomputer'. Together they form a unique fingerprint.

Cite this