Abstract
Storage resources in high-performance computing are shared across all user applications. Consequently, storage performance can vary markedly, depending not only on an application's workload but also on what other activity is concurrently running across the system. This variability in storage performance is directly reflected in overall execution time variability, thus confounding efforts to predict job performance for scheduling or capacity planning. I/O variability also complicates the seemingly straightforward process of performance measurement when evaluating application optimizations. In this work we present a methodology to measure I/O contention with more rigor than in prior work. We apply statistical techniques to gain insight from application-level statistics and storage-side logging. We examine different correlation metrics for relating system workload to job I/O performance and identify an effective and generally applicable metric for measuring job I/O performance. We further demonstrate that the system-wide monitoring granularity can directly affect the strength of correlation observed. Insufficient granularity and measurements can hide the correlations between application I/O performance and system-wide I/O activity.
Original language | English |
---|---|
Title of host publication | 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781538634868 |
DOIs | |
State | Published - Sep 6 2017 |
Externally published | Yes |
Event | 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 - Shenzhen, China Duration: Aug 7 2017 → Aug 9 2017 |
Publication series
Name | 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 - Proceedings |
---|
Conference
Conference | 2017 IEEE International Conference on Networking, Architecture, and Storage, NAS 2017 |
---|---|
Country/Territory | China |
City | Shenzhen |
Period | 08/7/17 → 08/9/17 |
Funding
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357.