Analysis and Modeling of the End-to-End I/O Performance on OLCF's Titan Supercomputer

Lipeng Wan, Matthew Wolf, Feiyi Wang, Jong Youl Choi, George Ostrouchov, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

With the increase of scale and complexity seen in a variety of leadership-class scientific computation and simulation applications, it has become more important to understand their I/O performance characteristics. The user-observed performance is a combination of properties of how the application is using the HPC facility, as well as how others' use of the facility causes variability in the static machine capabilities. Our work leverages statistical analysis of I/O performance data gathered with fine time resolution over a full week from Titan supercomputer. Based on observed properties of the distribution of I/O latencies, we build a three-state hidden Markov model (HMM) to characterize the end-to-end I/O performance on Titan. We parameterize our model using part of the field-gathered I/O performance data and validate it against the rest. The validation results demonstrate that our model can capture the dynamics of end-to-end I/O performance on Titan accurately.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 19th Intl Conference on High Performance Computing and Communications, HPCC 2017, 2017 IEEE 15th Intl Conference on Smart City, SmartCity 2017 and 2017 IEEE 3rd Intl Conference on Data Science and Systems, DSS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-9
Number of pages9
ISBN (Electronic)9781538625880
DOIs
StatePublished - Jul 2 2017
Event19th IEEE Intl Conference on High Performance Computing and Communications, 15th IEEE Intl Conference on Smart City, and 3rd IEEE Intl Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017 - Bangkok, Thailand
Duration: Dec 18 2017Dec 20 2017

Publication series

NameProceedings - 2017 IEEE 19th Intl Conference on High Performance Computing and Communications, HPCC 2017, 2017 IEEE 15th Intl Conference on Smart City, SmartCity 2017 and 2017 IEEE 3rd Intl Conference on Data Science and Systems, DSS 2017
Volume2018-January

Conference

Conference19th IEEE Intl Conference on High Performance Computing and Communications, 15th IEEE Intl Conference on Smart City, and 3rd IEEE Intl Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017
Country/TerritoryThailand
CityBangkok
Period12/18/1712/20/17

Fingerprint

Dive into the research topics of 'Analysis and Modeling of the End-to-End I/O Performance on OLCF's Titan Supercomputer'. Together they form a unique fingerprint.

Cite this