TY - GEN
T1 - Distributed Multi-GPU Community Detection on Exascale Computing Platforms
AU - Sattar, Naw Safrin
AU - Lu, Hao
AU - Wang, Feiyi
AU - Halappanavar, Mahantesh
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Community detection is a fundamental operation in graph mining, and by uncovering hidden structures and patterns within complex systems it helps solve fundamental problems pertaining to social networks, such as information diffusion, epidemics, and recommender systems. Scaling graph algorithms for massive networks becomes challenging on modern distributed-memory multi-GPU (Graphics Processing Unit) systems due to limitations such as irregular memory access patterns, load imbalances, higher communication-computation ratios, and cross-platform support. We present a novel algorithm HiPDPL-GPU (Distributed Parallel Louvain) to address these challenges. We conduct experiments involving different partitioning techniques to achieve an optimized performance of HiPDPL-GPU on the two largest supercomputers: Frontier and Summit. Remarkably, HiPDPL-GPU processes a graph with 4.2 billion edges in less than 3 minutes using 1024 GPUs. Qualitatively, the performance of HiPDPL-GPU is similar or better compared to other state-of-the-art CPU- and GPU-based implementations. While prior GPU implementations have predominantly employed CUDA, our first-of-its-kind implementation for community detection is cross-platform, accommodating both AMD and NVIDIA GPUs.
AB - Community detection is a fundamental operation in graph mining, and by uncovering hidden structures and patterns within complex systems it helps solve fundamental problems pertaining to social networks, such as information diffusion, epidemics, and recommender systems. Scaling graph algorithms for massive networks becomes challenging on modern distributed-memory multi-GPU (Graphics Processing Unit) systems due to limitations such as irregular memory access patterns, load imbalances, higher communication-computation ratios, and cross-platform support. We present a novel algorithm HiPDPL-GPU (Distributed Parallel Louvain) to address these challenges. We conduct experiments involving different partitioning techniques to achieve an optimized performance of HiPDPL-GPU on the two largest supercomputers: Frontier and Summit. Remarkably, HiPDPL-GPU processes a graph with 4.2 billion edges in less than 3 minutes using 1024 GPUs. Qualitatively, the performance of HiPDPL-GPU is similar or better compared to other state-of-the-art CPU- and GPU-based implementations. While prior GPU implementations have predominantly employed CUDA, our first-of-its-kind implementation for community detection is cross-platform, accommodating both AMD and NVIDIA GPUs.
KW - clustering
KW - community detection
KW - HIP
KW - hybrid
KW - Louvain
KW - MPI
KW - multi-GPU
UR - http://www.scopus.com/inward/record.url?scp=85200721812&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW63119.2024.00147
DO - 10.1109/IPDPSW63119.2024.00147
M3 - Conference contribution
AN - SCOPUS:85200721812
T3 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
SP - 815
EP - 824
BT - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
Y2 - 27 May 2024 through 31 May 2024
ER -