Abstract
We introduce UnifyFS, a user-level file system that aggregates node-local storage tiers available on high performance computing (HPC) systems and makes them available to HPC applications under a unified namespace. UnifyFS employs transparent I/O interception, so it does not require changes to application code and is compatible with commonly used HPC I/O libraries. The design of UnifyFS supports the predominant HPC I/O workloads and is optimized for bulk-synchronous I/O patterns. Furthermore, UnifyFS provides customizable file system semantics to flexibly adapt its behavior for diverse I/O workloads and storage devices. In this paper, we discuss the unique design goals and architecture of UnifyFS and evaluate its performance on a leadership-class HPC system. In our experimental results, we demonstrate that UnifyFS exhibits excellent scaling performance for write operations and can improve the performance of application checkpoint operations by as much as 3× versus a tuned configuration.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 290-300 |
Number of pages | 11 |
ISBN (Electronic) | 9798350337662 |
DOIs | |
State | Published - 2023 |
Event | 37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 - St. Petersburg, United States Duration: May 15 2023 → May 19 2023 |
Publication series
Name | Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 |
---|
Conference
Conference | 37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 |
---|---|
Country/Territory | United States |
City | St. Petersburg |
Period | 05/15/23 → 05/19/23 |
Funding
Notice: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaboratvi e effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.This work was performed under the auspices of the U.S. Department of Energy by Oak Ridge National Laboratory under Contract DE-AC05-00OR22725 and Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-CONF-829824.
Keywords
- Distributed file systems
- Parallel I/O
- Parallel systems
- Storage devices
- Storage hierarchies