TY - GEN
T1 - Accelerating Application Bulk Synchronous Writes in HPC Environments
AU - Khan, Awais
AU - Zimmer, Christopher
AU - Atchley, Scott
AU - Miller, Ross
AU - Oral, Sarp
AU - Wang, Feiyi
N1 - Publisher Copyright:
© 2024 is held by the owner/author(s).
PY - 2024/6/3
Y1 - 2024/6/3
N2 - High-bandwidth storage tiers are becoming more common for their capability to absorb high-rate, bursty I/Os. Notably, the designs of these fast storage tiers differ from system to system. The variation of these layers and non-uniform methods of access can pose challenges for applications seeking to run at multiple HPC facilities. Therefore, in this work, we present Spectral, a rapid-output abstraction library to accelerate application, bulk-synchronous writes on HPC systems. We design Spectral to enable applications to use high-bandwidth storage, such as node-local storage and distributed, write-caches (e.g., burst buffers) transparently without requiring modifications to the application or file system source code. The key idea is to allow applications to spend most of the time performing productive work and to not require any source code changes for maximum portability on different HPC architectures. Spectral internally re-routes write-only files through available, high-performance I/O resources before ultimately migrating them to the shared global parallel file system. For instance, on Summit, Spectral transparently places application outputs on node-local storage and then utilizes asynchronous migration to the center-wide GPFS file system. We evaluate Spectral on the Summit HPC system (1024 nodes) using the IOR benchmark and real scientific applications. Spectral shows linear performance scaling, improving application write performance by over an order of magnitude when compared to GPFS.
AB - High-bandwidth storage tiers are becoming more common for their capability to absorb high-rate, bursty I/Os. Notably, the designs of these fast storage tiers differ from system to system. The variation of these layers and non-uniform methods of access can pose challenges for applications seeking to run at multiple HPC facilities. Therefore, in this work, we present Spectral, a rapid-output abstraction library to accelerate application, bulk-synchronous writes on HPC systems. We design Spectral to enable applications to use high-bandwidth storage, such as node-local storage and distributed, write-caches (e.g., burst buffers) transparently without requiring modifications to the application or file system source code. The key idea is to allow applications to spend most of the time performing productive work and to not require any source code changes for maximum portability on different HPC architectures. Spectral internally re-routes write-only files through available, high-performance I/O resources before ultimately migrating them to the shared global parallel file system. For instance, on Summit, Spectral transparently places application outputs on node-local storage and then utilizes asynchronous migration to the center-wide GPFS file system. We evaluate Spectral on the Summit HPC system (1024 nodes) using the IOR benchmark and real scientific applications. Spectral shows linear performance scaling, improving application write performance by over an order of magnitude when compared to GPFS.
UR - http://www.scopus.com/inward/record.url?scp=85205030027&partnerID=8YFLogxK
U2 - 10.1145/3660320.3660334
DO - 10.1145/3660320.3660334
M3 - Conference contribution
AN - SCOPUS:85205030027
T3 - SNTA 2024 - Proceedings of the 2024 7th International Workshop on Systems and Network Telemetry and Analytics, Part of: HPDC 2024 - 33rd International Symposium on High-Performance Parallel and Distributed Computing
SP - 7
EP - 14
BT - SNTA 2024 - Proceedings of the 2024 7th International Workshop on Systems and Network Telemetry and Analytics, Part of
PB - Association for Computing Machinery, Inc
T2 - 7th International Workshop on Systems and Network Telemetry and Analytics, SNTA 2024
Y2 - 3 June 2024 through 7 June 2024
ER -