TY - GEN
T1 - Using infiniband hardware gather-scatter capabilities to optimize MPI all-to-all
AU - Gainaru, Ana
AU - Graham, Richard L.
AU - Polyakov, Artem
AU - Shainer, Gilad
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/9/25
Y1 - 2016/9/25
N2 - The MPI all-to-all algorithm is a data intensive, high-cost collective algorithm used by many scientific High Performance Computing applications. Optimizations for small data exchange use aggregation techniques, such as the Bruck algorithm, to minimize the number of messages sent, and minimize overall operation latency. This paper presents three variants of the Bruck algorithm, which differ in the way data is laid out in memory at intermediate steps of the algorithm. Mellanox's InfiniBand support for Host Channel Adapter (HCA) hardware scatter/gather is used selectively to replace CPU-based buffer packing and unpacking. Using this offload capability reduces the eight and sixteen byte all-to-all latency on 1024 MPI Processes by 9.7% and 9.1%, respectively. The optimization accounts for a decrease in the total memory handling time of 40.6% and 57.9%, respectively.
AB - The MPI all-to-all algorithm is a data intensive, high-cost collective algorithm used by many scientific High Performance Computing applications. Optimizations for small data exchange use aggregation techniques, such as the Bruck algorithm, to minimize the number of messages sent, and minimize overall operation latency. This paper presents three variants of the Bruck algorithm, which differ in the way data is laid out in memory at intermediate steps of the algorithm. Mellanox's InfiniBand support for Host Channel Adapter (HCA) hardware scatter/gather is used selectively to replace CPU-based buffer packing and unpacking. Using this offload capability reduces the eight and sixteen byte all-to-all latency on 1024 MPI Processes by 9.7% and 9.1%, respectively. The optimization accounts for a decrease in the total memory handling time of 40.6% and 57.9%, respectively.
KW - All-to-all
KW - Collective communication
KW - MPI
KW - Network offload
UR - http://www.scopus.com/inward/record.url?scp=84995662375&partnerID=8YFLogxK
U2 - 10.1145/2966884.2966918
DO - 10.1145/2966884.2966918
M3 - Conference contribution
AN - SCOPUS:84995662375
T3 - ACM International Conference Proceeding Series
SP - 167
EP - 179
BT - Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016
PB - Association for Computing Machinery
T2 - 23rd European MPI Users' Group Meeting, EuroMPI 2016
Y2 - 25 September 2016 through 28 September 2016
ER -