A case study of MPI over long distance connections

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Scientific workflows are increasingly being distributed across wide-area networks, and their code executions are expected to span across geographically dispersed computing systems. MPI has been extensively used to support communications for distributed computations, typically, over compute clusters and high-performance systems within a single facility. We present a case study of performance of MPI basic operations over long distance connections, wherein TCP is used for the underlying transport. We present measurements of execution times of MPI codes that utilize MPI Sendrecv operations over emulated 10Gbps connections with 0-366ms round-trip times, including the longest one spanning the globe. They demonstrate that basic MPI codes can be sustained over long distance connections under external packet loss rates up to 10%. They also highlight the qualitative effects of losses which manifest as increased execution times as a consequence of TCP's loss recovery process.

Original languageEnglish
Title of host publicationSysCon 2019 - 13th Annual IEEE International Systems Conference, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538683965
DOIs
StatePublished - Apr 2019
Event13th Annual IEEE International Systems Conference, SysCon 2019 - Orlando, United States
Duration: Apr 8 2019Apr 11 2019

Publication series

NameSysCon 2019 - 13th Annual IEEE International Systems Conference, Proceedings

Conference

Conference13th Annual IEEE International Systems Conference, SysCon 2019
Country/TerritoryUnited States
CityOrlando
Period04/8/1904/11/19

Funding

This work is funded by the Mathematics of Complex, Distributed, Interconnected Systems Program, Office of Advanced Computing Research, U.S. Department of Energy, and by Extreme Scale Systems Center, sponsored by U. S. Department of Defense, and performed at Oak Ridge National Laboratory managed by UT-Battelle, LLC for U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
Extreme Scale Systems Center
Office of Advanced Computing Research
U. S. Department of Defense
UT-BattelleDE-AC05-00OR22725
U.S. Department of Defense
U.S. Department of Energy
Advanced Scientific Computing Research
Oak Ridge National Laboratory

    Keywords

    • Execution time
    • MPI
    • Network measurements
    • Wide-area networks

    Fingerprint

    Dive into the research topics of 'A case study of MPI over long distance connections'. Together they form a unique fingerprint.

    Cite this