TY - GEN
T1 - UCX
T2 - 23rd IEEE Annual Symposium on High-Performance Interconnects, HOTI 2015
AU - Shamis, Pavel
AU - Venkata, Manjunath Gorentla
AU - Lopez, M. Graham
AU - Baker, Matthew B.
AU - Hernandez, Oscar
AU - Itigin, Yossi
AU - Dubman, Mike
AU - Shainer, Gilad
AU - Graham, Richard L.
AU - Liss, Liran
AU - Shahar, Yiftah
AU - Potluri, Sreeram
AU - Rossetti, Davide
AU - Becker, Donald
AU - Poole, Duncan
AU - Lamb, Christopher
AU - Kumar, Sameer
AU - Stunkel, Craig
AU - Bosilca, George
AU - Bouteiller, Aurelien
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/29
Y1 - 2015/10/29
N2 - This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides the ability to tailor its APIs and network functionality to suit a wide variety of application domains and hardware. We envision these APIs to satisfy the networking needs of many programming models such as Message Passing Interface (MPI), OpenSHMEM, Partitioned Global Address Space (PGAS) languages, task-based paradigms and I/O bound applications. To evaluate the design we implement the APIs and protocols, and measure the performance of overhead-critical network primitives fundamental for implementing many parallel programming models and system libraries. Our results show that the latency, bandwidth, and message rate achieved by the portable UCX prototype is very close to that of the underlying driver. With UCX, we achieved a message exchange latency of 0.89 us, a bandwidth of 6138.5 MB/s, and a message rate of 14 million messages per second. As far as we know, this is the highest bandwidth and message rate achieved by any network stack (publicly known) on this hardware.
AB - This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides the ability to tailor its APIs and network functionality to suit a wide variety of application domains and hardware. We envision these APIs to satisfy the networking needs of many programming models such as Message Passing Interface (MPI), OpenSHMEM, Partitioned Global Address Space (PGAS) languages, task-based paradigms and I/O bound applications. To evaluate the design we implement the APIs and protocols, and measure the performance of overhead-critical network primitives fundamental for implementing many parallel programming models and system libraries. Our results show that the latency, bandwidth, and message rate achieved by the portable UCX prototype is very close to that of the underlying driver. With UCX, we achieved a message exchange latency of 0.89 us, a bandwidth of 6138.5 MB/s, and a message rate of 14 million messages per second. As far as we know, this is the highest bandwidth and message rate achieved by any network stack (publicly known) on this hardware.
KW - HPC
KW - Infiniband
KW - Middleware
KW - MPI
KW - OpenSHMEM
KW - PGAS
KW - RDMA
UR - http://www.scopus.com/inward/record.url?scp=84962329834&partnerID=8YFLogxK
U2 - 10.1109/HOTI.2015.13
DO - 10.1109/HOTI.2015.13
M3 - Conference contribution
AN - SCOPUS:84962329834
T3 - Proceedings - 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, HOTI 2015
SP - 40
EP - 43
BT - Proceedings - 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, HOTI 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 August 2015 through 28 August 2015
ER -