TY - GEN
T1 - A single-tree algorithm to compute the Euclidean minimum spanning tree on GPUs
AU - Prokopenko, Andrey
AU - Sao, Piyush
AU - Lebrun-Grandie, Damien
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/29
Y1 - 2022/8/29
N2 - Computing the Euclidean minimum spanning tree (Emst) is a computationally demanding step of many algorithms. While work-efficient serial and multithreaded algorithms for computing Emst are known, designing an efficient GPU algorithm is challenging due to a complex branching structure, data dependencies, and load imbalances. In this paper, we propose a single-tree Borůvka-based algorithm for computing Emst on GPUs. We use an efficient nearest neighbor algorithm and reduce the number of the required distance calculations by avoiding traversing subtrees with leaf nodes in the same component. The developed algorithms are implemented in a performance portable way using ArborX, an open-source geometric search library based on the Kokkos framework. We evaluate the proposed algorithm on various 2D and 3D datasets, show and compare it with the current state-of-the-art open-source CPU implementations. We demonstrate 4-24 × speedup over the fastest multi-threaded implementation. We prove the portability of our implementation by providing results on a variety of hardware: AMD EPYC 7763, Nvidia A100 and AMD MI250X. We show scalability of the implementation, computing Emst for 37 million 3D cosmological dataset in under a 0.5 second on a single A100 Nvidia GPU.
AB - Computing the Euclidean minimum spanning tree (Emst) is a computationally demanding step of many algorithms. While work-efficient serial and multithreaded algorithms for computing Emst are known, designing an efficient GPU algorithm is challenging due to a complex branching structure, data dependencies, and load imbalances. In this paper, we propose a single-tree Borůvka-based algorithm for computing Emst on GPUs. We use an efficient nearest neighbor algorithm and reduce the number of the required distance calculations by avoiding traversing subtrees with leaf nodes in the same component. The developed algorithms are implemented in a performance portable way using ArborX, an open-source geometric search library based on the Kokkos framework. We evaluate the proposed algorithm on various 2D and 3D datasets, show and compare it with the current state-of-the-art open-source CPU implementations. We demonstrate 4-24 × speedup over the fastest multi-threaded implementation. We prove the portability of our implementation by providing results on a variety of hardware: AMD EPYC 7763, Nvidia A100 and AMD MI250X. We show scalability of the implementation, computing Emst for 37 million 3D cosmological dataset in under a 0.5 second on a single A100 Nvidia GPU.
KW - Euclidean minimum spanning tree
KW - GPU
KW - parallel algorithm
UR - http://www.scopus.com/inward/record.url?scp=85148577843&partnerID=8YFLogxK
U2 - 10.1145/3545008.3546185
DO - 10.1145/3545008.3546185
M3 - Conference contribution
AN - SCOPUS:85148577843
T3 - ACM International Conference Proceeding Series
BT - 51st International Conference on Parallel Processing, ICPP 2022 - Main Conference Proceedings
PB - Association for Computing Machinery
T2 - 51st International Conference on Parallel Processing, ICPP 2022
Y2 - 29 August 2022 through 1 September 2022
ER -