Abstract
Scientific computing workloads at HPC facilities have been shifting from traditional numerical simulations to AI/ML applications for training and inference while processing and producing ever-increasing amounts of scientific data. To address the growing need for increased storage capacity, lower access latency, and higher bandwidth, emerging technologies such as non-volatile memory are integrated into supercomputer I/O subsystems. With these emerging trends, we need a better understanding of the multilayer supercomputer I/O systems and ways to use these subsystems efficiently. In this work, we study the I/O access patterns and performance characteristics of two representative supercomputer I/O subsystems. Through an extensive analysis of year-long I/O logs on each system, we report new observations in I/O reads and writes, unbalanced use of storage system layers, and new trends in user behaviors at the HPC I/O middleware stack.
Original language | English |
---|---|
Title of host publication | HPDC 2022 - Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing |
Publisher | Association for Computing Machinery, Inc |
Pages | 43-55 |
Number of pages | 13 |
ISBN (Electronic) | 9781450391993 |
DOIs | |
State | Published - Jun 27 2022 |
Event | 31st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2022 - Virtual, Online, United States Duration: Jun 27 2022 → Jun 30 2022 |
Publication series
Name | HPDC 2022 - Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing |
---|
Conference
Conference | 31st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2022 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 06/27/22 → 06/30/22 |
Funding
This work used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the National Energy Research Scientific Computing Center under Contract No. DE-AC02-05CH11231. This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357.
Keywords
- access patterns
- high-performance computing
- in-system storage
- parallel file systems
- production system