Building a wide-area file transfer performance predictor: An empirical study

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    1 Scopus citations

    Abstract

    Wide-area data transfer is central to geographically distributed scientific workflows. Faster delivery of data is important for these workflows. Predictability is equally (or even more) important. With the goal of providing a reasonably accurate estimate of data transfer time to improve resource allocation & scheduling for workflows and enable end-to-end data transfer optimization, we apply machine learning methods to develop predictive models for data transfer times over a variety of wide area networks. To build and evaluate these models, we use 201,388 transfers, involving 759 million files totaling 9 PB transferred, over 115 heavily used source-destination pairs (“edges”) between 135 unique endpoints. We evaluate the models for different retraining frequencies and different window size of history data. In the best case, the resulting models have a median prediction error of ≤21% for 50% of the edges, and ≤32% for 75% of the edges. We present a detailed analysis of these results that provides insights into the cause of some of the high errors. We envision that the performance predictor will be informative for scheduling geo-distributed workflows. The insights also suggest obvious directions for both further analysis and transfer service optimization.

    Original languageEnglish
    Title of host publicationMachine Learning for Networking - 1st International Conference, MLN 2018, Revised Selected Papers
    EditorsPaul Mühlethaler, Selma Boumerdassi, Éric Renault
    PublisherSpringer Verlag
    Pages56-78
    Number of pages23
    ISBN (Print)9783030199449
    DOIs
    StatePublished - 2019
    Event1st International Conference on Machine Learning for Networking, MLN 2018 - Paris, France
    Duration: Nov 27 2018Nov 29 2018

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11407 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference1st International Conference on Machine Learning for Networking, MLN 2018
    Country/TerritoryFrance
    CityParis
    Period11/27/1811/29/18

    Funding

    Acknowledgments. This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357. We gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory.

    Fingerprint

    Dive into the research topics of 'Building a wide-area file transfer performance predictor: An empirical study'. Together they form a unique fingerprint.

    Cite this