Explaining wide area data transfer performance

Zhengchun Liu, Prasanna Balaprakash, Rajkumar Kettimuthu, Ian Foster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

36 Scopus citations

Abstract

Disk-to-disk wide-area file transfers involve many subsystems and tunable application parameters that pose significant challenges for bottleneck detection, system optimization, and performance prediction. Performance models can be used to address these challenges but have not proved generally usable because of a need for extensive online experiments to characterize subsystems. We show here how to overcome the need for such experiments by applying machine learning methods to historical data to estimate parameters for predictive models. Starting with log data for millions of Globus transfers involving billions of files and hundreds of petabytes, we engineer features for endpoint CPU load, network interface card load, and transfer characteristics; and we use these features in both linear and nonlinear models of transfer performance, We show that the resulting models have high explanatory power. For a representative set of 30,653 transfers over 30 heavily used source-destination pairs ("edges"), totaling 2,053 TB in 46.6 million files, we obtain median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively; when using a single nonlinear model for all edges, we obtain an MdAPE of 7.8%. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and competing load. Our predictions can be used for distributed workflow scheduling and optimization, and our features can also be used for optimization and explanation.

Original languageEnglish
Title of host publicationHPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages167-178
Number of pages12
ISBN (Electronic)9781450346993
DOIs
StatePublished - Jun 26 2017
Externally publishedYes
Event26th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2017 - Washington, United States
Duration: Jun 26 2017Jun 30 2017

Publication series

NameHPDC 2017 - Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Conference

Conference26th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2017
Country/TerritoryUnited States
CityWashington
Period06/26/1706/30/17

Fingerprint

Dive into the research topics of 'Explaining wide area data transfer performance'. Together they form a unique fingerprint.

Cite this