TY - GEN
T1 - Tartan
T2 - 2018 IEEE International Symposium on Workload Characterization, IISWC 2018
AU - Li, Ang
AU - Song, Shuaiwen Leon
AU - Chen, Jieyang
AU - Liu, Xu
AU - Tallent, Nathan
AU - Barker, Kevin
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/11
Y1 - 2018/12/11
N2 - High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale applications. However, the lack of deep understanding on how modern GPUs can be connected and the actual impact of state-of-the-art interconnect on multiGPU application performance becomes a hurdle. Additionally, the absence of a practical multi-GPU benchmark suite poses further obstacles for conducting research in multi-GPU era. In this paper, we fill the gap by proposing a multi-GPU benchmark suite named Tartan, which contains microbenchmarks, scale-up and scale-out applications. We then apply Tartan to evaluate the four latest types of modern GPU interconnects, i.e., PCI-e, NVLink-V1, NVLink-V2 and InfiniBand with GPUDirect-RDMA from two recently released NVIDIA super AI platforms as well as ORNL's exascale prototype system. Based on empirical evaluation, we observe four new types of NUMA effects: three types are triggered by NVLink's topology, connectivity and routing, while one type is caused by PCI-e (i.e., anti-locality). They are very important for performance tuning in multi-GPU environment. Our evaluation results show that, unless the current CPU-GPU master-slave programming model can be replaced, it is difficult for scale-up multi-GPU applications to really benefit from faster intra-node interconnects such as NVLinks; while for inter-node scale-out applications, although interconnect is more crucial to the overall performance, GPUDirect-RDMA appears to be not always the optimal choice. The Tartan benchmark suite including the microbenchmarks are opensource and available athttp://github.com/uuudown/Tartan.
AB - High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale applications. However, the lack of deep understanding on how modern GPUs can be connected and the actual impact of state-of-the-art interconnect on multiGPU application performance becomes a hurdle. Additionally, the absence of a practical multi-GPU benchmark suite poses further obstacles for conducting research in multi-GPU era. In this paper, we fill the gap by proposing a multi-GPU benchmark suite named Tartan, which contains microbenchmarks, scale-up and scale-out applications. We then apply Tartan to evaluate the four latest types of modern GPU interconnects, i.e., PCI-e, NVLink-V1, NVLink-V2 and InfiniBand with GPUDirect-RDMA from two recently released NVIDIA super AI platforms as well as ORNL's exascale prototype system. Based on empirical evaluation, we observe four new types of NUMA effects: three types are triggered by NVLink's topology, connectivity and routing, while one type is caused by PCI-e (i.e., anti-locality). They are very important for performance tuning in multi-GPU environment. Our evaluation results show that, unless the current CPU-GPU master-slave programming model can be replaced, it is difficult for scale-up multi-GPU applications to really benefit from faster intra-node interconnects such as NVLinks; while for inter-node scale-out applications, although interconnect is more crucial to the overall performance, GPUDirect-RDMA appears to be not always the optimal choice. The Tartan benchmark suite including the microbenchmarks are opensource and available athttp://github.com/uuudown/Tartan.
UR - http://www.scopus.com/inward/record.url?scp=85060302573&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2018.8573483
DO - 10.1109/IISWC.2018.8573483
M3 - Conference contribution
AN - SCOPUS:85060302573
T3 - 2018 IEEE International Symposium on Workload Characterization, IISWC 2018
SP - 191
EP - 202
BT - 2018 IEEE International Symposium on Workload Characterization, IISWC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 September 2018 through 2 October 2018
ER -