Abstract
Distributed scientific and big-data computations are becoming increasingly dependent on access to remote files. Wide-area file transfers are supported by two basic schemes: (i) application-level tools, such as GridFTP, that provide transport services between file systems housed at geographically separated sites, and (ii) file systems mounted over wide-area networks, using mechanisms such as LNet routers that make them transparently available. In both cases, the file transfer performance critically depends on the configuration consisting of host, file, IO, and disk subsystems, which are complex by themselves, as well as on their complex compositions implemented using buffers and IO-network data transitions. We present extensive file transfer rate measurements collected over dedicated 10 Gbps connections with 0-366 ms round-trip times, using GridFTP and XDD file transfer tools, and Lustre file system extended over wide-area networks using LNet routers. Our test configurations are composed of: three types of host systems; XFS, Lustre, and ext3 file systems; and Ethernet and SONET wide-area connections. We present analytics based on the convexity-concavity of throughput profiles which provide insights into throughput and its superior or inferior trend compared to linear interpolations. We propose the utilization-concavity coefficient, a scalar metric that characterizes the overall performance of any file transfer method consisting of specific configuration and scheme. Our results enable performance optimizations by highlighting the significant roles of (i) buffer sizes and parallelism in GridFTP and XDD, and (ii) buffer utilization and credit mechanism in LNet routers.
Original language | English |
---|---|
Title of host publication | ICDCN 2019 - Proceedings of the 2019 International Conference on Distributed Computing and Networking |
Publisher | Association for Computing Machinery |
Pages | 183-192 |
Number of pages | 10 |
ISBN (Electronic) | 9781450360944 |
DOIs | |
State | Published - Jan 4 2019 |
Event | 20th International Conference on Distributed Computing and Networking, ICDCN 2019 - Bangalore, India Duration: Jan 4 2019 → Jan 7 2019 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 20th International Conference on Distributed Computing and Networking, ICDCN 2019 |
---|---|
Country/Territory | India |
City | Bangalore |
Period | 01/4/19 → 01/7/19 |
Funding
This work is funded by RAMSES project and the Applied Mathematics Program, Office of Advanced Computing Research, U.S. Department of Energy, and by Extreme Scale Systems Center, sponsored by U. S. Department of Defense, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC for U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- Lustre file system
- Network measurements
- Throughput profile
- Wide-area networks