Abstract
The Lustre filesystem serves as a vital element in high-performance parallel storage, meeting the rising demands of scientific, research, and enterprise environments. Widely deployed across HPC environments, ranging from small-scale applications in AI/ML, to domains like oil and gas, drug discovery, and meteorology, and manufacturing, Lustre addresses the universal challenge of efficiently accessing vast and ever-increasing volumes of data. Lustre is the filesystem of choice on six out of the top 10 fastest supercomputers in the world today, over 65% of the top 100, and also for over 60% of the top 500. Despite its widespread popularity, there is a lack of a complete and up-to-date reference, covering Lustre’s evolution, design, and various advancements made over the years. In this journal, we aim to fill this gap by providing a comprehensive journey of Lustre, including its history with significant contributions to HPC, detailed architecture and design elements, exploration of advancements added through its evolution, and future directions. Additionally, we present a comparison of Lustre with other prominent storage technologies of the era. To illustrate the current state of Lustre, we analyze several filesystem trends, including utilization, performance, and usage patterns on Orion, the Lustre filesystem on the first exascale supercomputer Frontier. We hope that this journal serves as a comprehensive educational reference for the current and future generations interested in HPC filesystem storage aspects.
| Original language | English |
|---|---|
| Article number | 21 |
| Journal | ACM Transactions on Storage |
| Volume | 21 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jun 18 2025 |
Funding
We thank our colleagues from the National Center for Computational Sciences at Oak Ridge National Laboratory, especially Dustin Leverman for providing access to the Orion filesystem performance evaluations. We acknowledge the “Lustre Software Release 2.x Operations Manual”, that assisted us in writing few sections related to resilience and availability aspects of the paper. We also extend acknowledgement to the previously published studies on Lustre (Lustre internals technical report, Lustre white papers by Lustre authors, wiki.whamcloud, and wiki.lustre documentation pages) as referenced in our paper for enhancing our understanding and gathering information on various concepts presented in the paper. We thank the anonymous reviewers and the journal editors for their valuable comments and suggestions, which have significantly improved our manuscript. This research was sponsored by and used resources of the Oak Ridge Leadership Computing Facility (OLCF), which is a DOE Office of Science User Facility at the Oak Ridge National Laboratory supported by the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- High Performance Computing (HPC)
- Parallel file system
- frontier exascale supercomputer
- orion filesystem