Abstract
Many extreme-scale applications require the movement of large quantities of data to, from, and among leadership computing facilities, as well as other scientific facilities and the home institutions of facility users. These applications, particularly when leadership computing facilities are involved, can touch upon edge cases (e.g., terabyte files) that had not been a focus of previous Globus optimization work, which had emphasized rather the movement of many smaller (megabyte to gigabyte) files. We report here on how automated client-driven chunking can be used to accelerate both the movement of large files and the integrity checking operations that have proven to be essential for large data transfers. We present detailed performance studies that provide insights into the benefits of these modifications in a range of file transfer scenarios.
| Original language | English |
|---|---|
| Pages (from-to) | 658-670 |
| Number of pages | 13 |
| Journal | International Journal of High Performance Computing Applications |
| Volume | 38 |
| Issue number | 6 |
| DOIs | |
| State | Published - Nov 2024 |
Funding
We acknowledge support from the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. We are grateful to the ALCF, NERSC, and OLCF, DOE Office of Science User Facilities supported under Contracts DE-AC02-06CH11357, DE-AC02-05CH11231, and DE-AC05-00OR22725, respectively, for access to computing resources used in experiments. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Exascale Computing Project (17-SC-20-S), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Exascale Computing Project (17-SC-20-S), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.
Keywords
- High-speed communications
- big data
- exascale computing
- globus
- integrity checking
- network protocols