Estimation of RTT and loss rate of wide-area connections using MPI measurements

Nageswara S.V. Rao, Neena Imam, Zhengchun Liu, Raj Kettimuthu, Ian Foster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Scientific computations are expected to be increasingly distributed across wide-area networks, and the Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. These computations should account for certain network parameters to ensure an effective execution, for example, by avoiding highly congested and long connections. The execution times of MPI basic operations reflect the connection parameters, including the Round Trip Time (RTT) and loss rate. We describe five machine leaning methods to estimate the connection RTT and loss rate using execution times of MPI basic operations. We utilize execution time measurements of MPI_Sendrecv operations collected over emulated 10 Gbps connections with 0-366 ms round-trip times, wherein the longest connection spans the globe, under up to 20% periodic losses. These methods provide disparate, namely, linear and non-linear, and smooth and non-smooth, estimates of RTT and loss rate. Our results show that accurate estimates can be generated at low loss rates but they become inaccurate at loss rates 10% and higher. Overall, these results constitute a case study of the strengths and limitations of machine learning methods in inferring network-level parameters using application-level measurements.

Original languageEnglish
Title of host publicationProceedings of 6th Annual International Workshop on Innovating the Network for Data Intensive Science, INDIS 2019 - Held in conjunction with SC 2019
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages17-24
Number of pages8
ISBN (Electronic)9781728166667
DOIs
StatePublished - Nov 2019
Event6th Annual International Workshop on Innovating the Network for Data Intensive Science, INDIS 2019 - Denver, United States
Duration: Nov 17 2019Nov 17 2019

Publication series

NameProceedings of 6th Annual International Workshop on Innovating the Network for Data Intensive Science, INDIS 2019 - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference6th Annual International Workshop on Innovating the Network for Data Intensive Science, INDIS 2019
Country/TerritoryUnited States
CityDenver
Period11/17/1911/17/19

Funding

This work is funded by RAMSES project, Office of Advanced Computing Research, U.S. Department of Energy, and by Extreme Scale Systems Center, sponsored by U. S. Department of Defense, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC for U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • Execution time
  • Loss rate
  • MPI
  • Network measurements
  • RTT
  • Wide-area networks

Fingerprint

Dive into the research topics of 'Estimation of RTT and loss rate of wide-area connections using MPI measurements'. Together they form a unique fingerprint.

Cite this