On analytics of file transfer rates over dedicated wide-area connections

Satyabrata Sen, Nageswara S.V. Rao, Qiang Liu, Neena Imam, Rajkumar Kettimuthu, Ian Foster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

File transfers between the decentralized storage sites over dedicated wide-area connections are becoming increasingly important in high-performance computing and big data scenarios. Designing such scientific workflows for large file transfers is extremely challenging as they depend on the file, I/O, host, and local- and wide-area network subsystems, and their interactions. To gain insights into file-transfer rate profiles, we develop polynomial, bagging, and boosting regression models for Lustre and XFS file transfer measurements, which are collected using XDD over a suite of 10 Gbps connections with 0-366 ms round trip times (RTTs). In addition to overall trends and analytics, these regressions also provide file-transfer rate estimates for RTTs and number of parallel flows at which measurements might not have been collected. They show that bagging and boosting techniques provide closer data fits than the polynomial regression. We develop probabilistic bounds on the generalization error of these methods, which combined with the cross-validation error establish that former two are more accurate estimators than the polynomial regression. In addition, we present a method to efficiently determine the number of parallel flows to achieve a peak file-transfer rate using fewer than full sweep measurements; in our measurements, the peak is achieved in 96% of cases with 15-25% of measurements of a full sweep.

Original languageEnglish
Title of host publicationProceedings - 13th IEEE International Conference on eScience, eScience 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages576-585
Number of pages10
ISBN (Electronic)9781538626863
DOIs
StatePublished - Nov 14 2017
Event13th IEEE International Conference on eScience, eScience 2017 - Auckland, New Zealand
Duration: Oct 24 2017Oct 27 2017

Publication series

NameProceedings - 13th IEEE International Conference on eScience, eScience 2017

Conference

Conference13th IEEE International Conference on eScience, eScience 2017
Country/TerritoryNew Zealand
CityAuckland
Period10/24/1710/27/17

Keywords

  • TCP
  • Wide area transport
  • cross-validation
  • dedicated connections
  • fast probing
  • regression
  • throughput profiling

Fingerprint

Dive into the research topics of 'On analytics of file transfer rates over dedicated wide-area connections'. Together they form a unique fingerprint.

Cite this