Abstract
Scientific and big data computations are increasingly being distributed across wide-area networks, and they often require access to remote files. The file systems that are directly mounted over wide-area networks transparently support such computations, and also obviate the need for special purpose file transfer tools. In typical distributed file systems, the access is limited to local sites, and in particular, the reach of Lustre file system implemented over InfiniBand (IB) is limited to at most tens of miles due to 2.5ms latency bound. We describe LNet router methods that connect IB Lustre file system to remote Ethernet clients over wide-area networks. We collect extensive Lustre throughput measurements over 10Gbps connections with 0-366ms round-trip times. They demonstrate that Gbps throughput can be sustained over connections spanning the globe. We present Lustre throughput profiles over local and wide-area connections, which show the effects of various buffers and credits; in particular, they highlight the throughput limits for large transfers over wide-area connections. Furthermore, the measurements show the positive effects of pipelining in achieving higher throughput for successively file transfers compared to rates indicated by IOzone benchmark rates.
Original language | English |
---|---|
Title of host publication | 12th Annual IEEE International Systems Conference, SysCon 2018 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1-6 |
Number of pages | 6 |
ISBN (Electronic) | 9781538636640 |
DOIs | |
State | Published - May 30 2018 |
Event | 12th Annual IEEE International Systems Conference, SysCon 2018 - Vancouver, Canada Duration: Apr 24 2018 → Apr 26 2018 |
Publication series
Name | 12th Annual IEEE International Systems Conference, SysCon 2018 - Proceedings |
---|
Conference
Conference | 12th Annual IEEE International Systems Conference, SysCon 2018 |
---|---|
Country/Territory | Canada |
City | Vancouver |
Period | 04/24/18 → 04/26/18 |
Funding
This work is funded by the Mathematics of Complex, Distributed, Interconnected Systems Program, Office of Advanced Computing Research, U.S. Department of Energy, and by Extreme Scale Systems Center, sponsored by U. S. Department of Defense, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC for U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work is funded by the Mathematics of Complex, Distributed, Interconnected Systems Program, Office of Advanced Computing Research, U.S. Department of Energy, and by Extreme Scale Systems Center, sponsored by U. S. Department of Defense, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC for U.S. Department of Energy under Contract No. DE-AC05-00OR22725
Keywords
- Lustre
- Network measurements
- Throughput
- Wide-area networks