Data transfer between scientific facilities - Bottleneck analysis, insights and optimizations

Yuanlai Liu, Zhengchun Liu, Rajkumar Kettimuthu, Nageswara Rao, Zizhong Chen, Ian Foster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Scopus citations

Abstract

Wide area file transfers play an important role in many science applications. File transfer tools typically deliver the highest performance for datasets with a small number of large files, but many science datasets consist of many small files. Thus it is important to understand the factors that contribute to the decrease in wide area data transfer performance for datasets with many small files. To this end, we (i) benchmark the performance of subsystems involved in end-to-end file transfer between two HPC facilities for a many-file dataset that is representative of production science transfers; (ii) characterize the per-file overhead introduced by different subsystems; (iii) identify potential dependencies and bottlenecks; (iv) study the effectiveness of transferring many files concurrently as a means of reducing per-file overheads; and (v) prototype a prefetching mechanism as an alternative of concurrency to reduce the per-file overhead on source storage system. We show that both concurrency and prefetching can help reduce the per-file overhead significantly. A reasonable level of concurrency combined with prefetching can bring the per-file overhead down to a negligible level.

Original languageEnglish
Title of host publicationProceedings - 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages122-131
Number of pages10
ISBN (Electronic)9781728109121
DOIs
StatePublished - May 2019
Event19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019 - Larnaca, Cyprus
Duration: May 14 2019May 17 2019

Publication series

NameProceedings - 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019

Conference

Conference19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019
Country/TerritoryCyprus
CityLarnaca
Period05/14/1905/17/19

Funding

This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357. Z. Liu and Y. Liu contributed equally to this research. We gratefully acknowledge the National Energy Research Scientific Computing Center and Argonne Leadership Computing Facility for providing us resources.

FundersFunder number
U.S. Department of Energy
Office of ScienceDE-AC02-06CH11357

    Keywords

    • Data transfer
    • GridFTP
    • Model
    • Optimization

    Fingerprint

    Dive into the research topics of 'Data transfer between scientific facilities - Bottleneck analysis, insights and optimizations'. Together they form a unique fingerprint.

    Cite this