Output performance study on a production petascale filesystem

Bing Xie, Jeffrey S. Chase, David Dillow, Scott Klasky, Jay Lofstead, Sarp Oral, Norbert Podhorszki

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper reports our observations from a top-tier supercomputer Titan and its Lustre parallel file stores under production load. In summary, we find that supercomputer file systems are highly variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write-sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O middleware system—distributes the I/O load so that each client writes separate files on multiple targets, and each target stores files for multiple clients, in a balanced way. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good spots” in the machine or in the file system: component performance is driven by transient load conditions, and past performance is not a useful predictor of future performance. For example, we do not observe regular diurnal load patterns.

Original languageEnglish
Title of host publicationHigh Performance Computing - ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Revised Selected Papers
EditorsRio Yokota, Julian M. Kunkel, Michela Taufer, John Shalf
PublisherSpringer Verlag
Pages187-200
Number of pages14
ISBN (Print)9783319676296
DOIs
StatePublished - 2017
Externally publishedYes
Event32nd International Conference on High Performance Computing, ISC High Performance 2017 - Frankfurt, Germany
Duration: Jun 18 2017Jun 22 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10524 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference32nd International Conference on High Performance Computing, ISC High Performance 2017
Country/TerritoryGermany
CityFrankfurt
Period06/18/1706/22/17

Funding

This research used resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at the Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC05-00OR22725. The work was supported by the U.S. Department of Energy, under FWP 16-018666, program manager Lucy Nowell.

Keywords

  • Output performance
  • Parallel I/O
  • Petascale filesystem

Fingerprint

Dive into the research topics of 'Output performance study on a production petascale filesystem'. Together they form a unique fingerprint.

Cite this