Abstract
The I/O subsystem for the Summit supercomputer, No. 1 on the Top500 list, and its ecosystem of analysis platforms is composed of two distinct layers, namely the in-system layer and the center-wide parallel file system layer (PFS), Spider 3. The in-system layer uses node-local SSDs and provides 26.7 TB/s for reads, 9.7 TB/s for writes, and 4.6 billion IOPS to Summit. The Spider 3 PFS layer uses IBM's Spectrum Scale and provides 2.5 TB/s and 2.6 million IOPS to Summit and other systems. While deploying them as two distinct layers was operationally efficient, it also presented usability challenges in terms of multiple mount points and lack of transparency in data movement. To address these challenges, we have developed novel end-to-end I/O solutions for the concerted use of the two storage layers. We present the I/O subsystem architecture, the end-to-end I/O solution space, their design considerations and our deployment experience.
Original language | English |
---|---|
Title of host publication | Proceedings of SC 2019 |
Subtitle of host publication | The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | IEEE Computer Society |
ISBN (Electronic) | 9781450362290 |
DOIs | |
State | Published - Nov 17 2019 |
Event | 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 - Denver, United States Duration: Nov 17 2019 → Nov 22 2019 |
Publication series
Name | International Conference for High Performance Computing, Networking, Storage and Analysis, SC |
---|---|
ISSN (Print) | 2167-4329 |
ISSN (Electronic) | 2167-4337 |
Conference
Conference | 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 11/17/19 → 11/22/19 |
Funding
This work was performed under the auspices of the U.S. DOE by Oak Ridge Leadership Computing Facility at ORNL under contract DE-AC05-00OR22725. The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- High performance computing
- Parallel IO
- Parallel file systems
- Performance bechmarking