Abstract
High-bandwidth storage tiers are becoming more common for their capability to absorb high-rate, bursty I/Os. Notably, the designs of these fast storage tiers differ from system to system. The variation of these layers and non-uniform methods of access can pose challenges for applications seeking to run at multiple HPC facilities. Therefore, in this work, we present Spectral, a rapid-output abstraction library to accelerate application, bulk-synchronous writes on HPC systems. We design Spectral to enable applications to use high-bandwidth storage, such as node-local storage and distributed, write-caches (e.g., burst buffers) transparently without requiring modifications to the application or file system source code. The key idea is to allow applications to spend most of the time performing productive work and to not require any source code changes for maximum portability on different HPC architectures. Spectral internally re-routes write-only files through available, high-performance I/O resources before ultimately migrating them to the shared global parallel file system. For instance, on Summit, Spectral transparently places application outputs on node-local storage and then utilizes asynchronous migration to the center-wide GPFS file system. We evaluate Spectral on the Summit HPC system (1024 nodes) using the IOR benchmark and real scientific applications. Spectral shows linear performance scaling, improving application write performance by over an order of magnitude when compared to GPFS.
| Original language | English |
|---|---|
| Title of host publication | SNTA 2024 - Proceedings of the 2024 7th International Workshop on Systems and Network Telemetry and Analytics, Part of |
| Subtitle of host publication | HPDC 2024 - 33rd International Symposium on High-Performance Parallel and Distributed Computing |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 7-14 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798400706486 |
| DOIs | |
| State | Published - Jun 3 2024 |
| Event | 7th International Workshop on Systems and Network Telemetry and Analytics, SNTA 2024 - Pisa, Italy Duration: Jun 3 2024 → Jun 7 2024 |
Publication series
| Name | SNTA 2024 - Proceedings of the 2024 7th International Workshop on Systems and Network Telemetry and Analytics, Part of: HPDC 2024 - 33rd International Symposium on High-Performance Parallel and Distributed Computing |
|---|
Conference
| Conference | 7th International Workshop on Systems and Network Telemetry and Analytics, SNTA 2024 |
|---|---|
| Country/Territory | Italy |
| City | Pisa |
| Period | 06/3/24 → 06/7/24 |
Funding
This research used resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at the Oak Ridge National Laboratory, which is supported by the Office of Science of the DOE under Contract DE-AC05-00OR22725.