TY - GEN
T1 - A case study of MPI over long distance connections
AU - Rao, Nageswara S.V.
AU - Imam, Neena
AU - Boehm, Swen
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/4
Y1 - 2019/4
N2 - Scientific workflows are increasingly being distributed across wide-area networks, and their code executions are expected to span across geographically dispersed computing systems. MPI has been extensively used to support communications for distributed computations, typically, over compute clusters and high-performance systems within a single facility. We present a case study of performance of MPI basic operations over long distance connections, wherein TCP is used for the underlying transport. We present measurements of execution times of MPI codes that utilize MPI Sendrecv operations over emulated 10Gbps connections with 0-366ms round-trip times, including the longest one spanning the globe. They demonstrate that basic MPI codes can be sustained over long distance connections under external packet loss rates up to 10%. They also highlight the qualitative effects of losses which manifest as increased execution times as a consequence of TCP's loss recovery process.
AB - Scientific workflows are increasingly being distributed across wide-area networks, and their code executions are expected to span across geographically dispersed computing systems. MPI has been extensively used to support communications for distributed computations, typically, over compute clusters and high-performance systems within a single facility. We present a case study of performance of MPI basic operations over long distance connections, wherein TCP is used for the underlying transport. We present measurements of execution times of MPI codes that utilize MPI Sendrecv operations over emulated 10Gbps connections with 0-366ms round-trip times, including the longest one spanning the globe. They demonstrate that basic MPI codes can be sustained over long distance connections under external packet loss rates up to 10%. They also highlight the qualitative effects of losses which manifest as increased execution times as a consequence of TCP's loss recovery process.
KW - Execution time
KW - MPI
KW - Network measurements
KW - Wide-area networks
UR - http://www.scopus.com/inward/record.url?scp=85073150685&partnerID=8YFLogxK
U2 - 10.1109/SYSCON.2019.8836721
DO - 10.1109/SYSCON.2019.8836721
M3 - Conference contribution
AN - SCOPUS:85073150685
T3 - SysCon 2019 - 13th Annual IEEE International Systems Conference, Proceedings
BT - SysCon 2019 - 13th Annual IEEE International Systems Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 13th Annual IEEE International Systems Conference, SysCon 2019
Y2 - 8 April 2019 through 11 April 2019
ER -