TY - GEN
T1 - Argo
T2 - 4th IEEE International Conference on Big Data, Big Data 2016
AU - Zheng, Angen
AU - Labrinidis, Alexandros
AU - Chrysanthis, Panos K.
AU - Lange, Jack
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016
Y1 - 2016
N2 - The increasing popularity and ubiquity of various large graph datasets has caused renewed interest for graph partitioning. Existing graph partitioners either scale poorly against large graphs or disregard the impact of the underlying hardware topology. A few solutions have shown that the nonuniform network communication costs may affect the performance greatly. However, none of them considers the impact of resource contention on the memory subsystems (e.g., LLC and Memory Controller) of modern multicore clusters. They all neglect the fact that the bandwidth of modern high-speed networks (e.g., Infiniband) has become comparable to that of the memory subsystems. In this paper, we provide an in-depth analysis, both theoretically and experimentally, on the contention issue for distributed workloads. We found that the slowdown caused by the contention can be as high as 11x. We then design an architecture-aware graph partitioner, Argo, to allow the full use of all cores of multicore machines without suffering from either the contention or the communication heterogeneity issue. Our experimental study showed (1) the effectiveness of Argo, achieving up to 12x speedups on three classic workloads: Breadth First Search, Single Source Shortest Path, and PageRank; and (2) the scalability of Argo in terms of both graph size and the number of partitions on two billion-edge real-world graphs.
AB - The increasing popularity and ubiquity of various large graph datasets has caused renewed interest for graph partitioning. Existing graph partitioners either scale poorly against large graphs or disregard the impact of the underlying hardware topology. A few solutions have shown that the nonuniform network communication costs may affect the performance greatly. However, none of them considers the impact of resource contention on the memory subsystems (e.g., LLC and Memory Controller) of modern multicore clusters. They all neglect the fact that the bandwidth of modern high-speed networks (e.g., Infiniband) has become comparable to that of the memory subsystems. In this paper, we provide an in-depth analysis, both theoretically and experimentally, on the contention issue for distributed workloads. We found that the slowdown caused by the contention can be as high as 11x. We then design an architecture-aware graph partitioner, Argo, to allow the full use of all cores of multicore machines without suffering from either the contention or the communication heterogeneity issue. Our experimental study showed (1) the effectiveness of Argo, achieving up to 12x speedups on three classic workloads: Breadth First Search, Single Source Shortest Path, and PageRank; and (2) the scalability of Argo in terms of both graph size and the number of partitions on two billion-edge real-world graphs.
KW - Contention
KW - Distributed Graph Processing
KW - Graph Partitioning
KW - Heterogeneity
KW - Multicore
UR - https://www.scopus.com/pages/publications/85015209342
U2 - 10.1109/BigData.2016.7840614
DO - 10.1109/BigData.2016.7840614
M3 - Conference contribution
AN - SCOPUS:85015209342
T3 - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
SP - 284
EP - 293
BT - Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
A2 - Joshi, James
A2 - Karypis, George
A2 - Liu, Ling
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Xia, Yinglong
A2 - Xu, Weijia
A2 - Sato, Aki-Hiro
A2 - Rachuri, Sudarsan
A2 - Ungar, Lyle
A2 - Yu, Philip S.
A2 - Govindaraju, Rama
A2 - Suzumura, Toyotaro
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 December 2016 through 8 December 2016
ER -