TY - GEN
T1 - UD-assisted Multi-path Transport in RDMA
AU - Choi, Mingyu
AU - Lee, Sugi
AU - Kim, Younghoon
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Although Remote Direct Memory Access (RDMA) has become one of the most promising networking technologies in data centers, it is prone to link failures and several schemes have been proposed to utilize redundant network paths in RDMA for mitigating the problem. However, they still do not scale well since their schemes rely on a connection-based transport mode which uses more cache memory than a datagram-based mode in RNICs. In this paper, we propose a novel UD-based multi-path transport in RDMA (MPRUD). MPRUD not only addresses the scalability problem by employing UD QPs which occupy only a small amount of cache memory but also obtains robustness with failure detection and recovery algorithms. Our evaluation shows that MPRUD can achieve the line rate bandwidth with utilizing multi-paths and successfully recover from link failures about 72x faster compared to a default single-path flow, significantly decreasing data loss to 0.12% (from 4.5GB to 5.6MB).
AB - Although Remote Direct Memory Access (RDMA) has become one of the most promising networking technologies in data centers, it is prone to link failures and several schemes have been proposed to utilize redundant network paths in RDMA for mitigating the problem. However, they still do not scale well since their schemes rely on a connection-based transport mode which uses more cache memory than a datagram-based mode in RNICs. In this paper, we propose a novel UD-based multi-path transport in RDMA (MPRUD). MPRUD not only addresses the scalability problem by employing UD QPs which occupy only a small amount of cache memory but also obtains robustness with failure detection and recovery algorithms. Our evaluation shows that MPRUD can achieve the line rate bandwidth with utilizing multi-paths and successfully recover from link failures about 72x faster compared to a default single-path flow, significantly decreasing data loss to 0.12% (from 4.5GB to 5.6MB).
KW - Multi-Path
KW - RDMA
KW - Unreliable Datagram
UR - http://www.scopus.com/inward/record.url?scp=85143257124&partnerID=8YFLogxK
U2 - 10.1109/ICTC55196.2022.9952631
DO - 10.1109/ICTC55196.2022.9952631
M3 - Conference contribution
AN - SCOPUS:85143257124
T3 - International Conference on ICT Convergence
SP - 127
EP - 129
BT - ICTC 2022 - 13th International Conference on Information and Communication Technology Convergence
PB - IEEE Computer Society
T2 - 13th International Conference on Information and Communication Technology Convergence, ICTC 2022
Y2 - 19 October 2022 through 21 October 2022
ER -