Abstract
This paper reports our observations from a top-tier supercomputer Titan and its Lustre parallel file stores under production load. In summary, we find that supercomputer file systems are highly variable across the machine at fine time scales. This variability has two major implications. First, stragglers lessen the benefit of coupled I/O parallelism (striping). Peak median output bandwidths are obtained with parallel writes to many independent files, with no striping or write-sharing of files across clients (compute nodes). I/O parallelism is most effective when the application—or its I/O middleware system—distributes the I/O load so that each client writes separate files on multiple targets, and each target stores files for multiple clients, in a balanced way. Second, our results suggest that the potential benefit of dynamic adaptation is limited. In particular, it is not fruitful to attempt to identify “good spots” in the machine or in the file system: component performance is driven by transient load conditions, and past performance is not a useful predictor of future performance. For example, we do not observe regular diurnal load patterns.
Original language | English |
---|---|
Title of host publication | High Performance Computing - ISC High Performance 2017 International Workshops, DRBSD, ExaComm, HCPM, HPC-IODC, IWOPH, IXPUG, P^3MA, VHPC, Visualization at Scale, WOPSSS, Revised Selected Papers |
Editors | Rio Yokota, Julian M. Kunkel, Michela Taufer, John Shalf |
Publisher | Springer Verlag |
Pages | 187-200 |
Number of pages | 14 |
ISBN (Print) | 9783319676296 |
DOIs | |
State | Published - 2017 |
Externally published | Yes |
Event | 32nd International Conference on High Performance Computing, ISC High Performance 2017 - Frankfurt, Germany Duration: Jun 18 2017 → Jun 22 2017 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10524 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 32nd International Conference on High Performance Computing, ISC High Performance 2017 |
---|---|
Country/Territory | Germany |
City | Frankfurt |
Period | 06/18/17 → 06/22/17 |
Funding
This research used resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at the Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC05-00OR22725. The work was supported by the U.S. Department of Energy, under FWP 16-018666, program manager Lucy Nowell.
Keywords
- Output performance
- Parallel I/O
- Petascale filesystem