TY - GEN
T1 - PANDORA
T2 - 53rd International Conference on Parallel Processing, ICPP 2024
AU - Sao, Piyush
AU - Prokopenko, Andrey
AU - Lebrun-Grandie, Damien
N1 - Publisher Copyright:
© 2024 Public Domain.
PY - 2024/8/12
Y1 - 2024/8/12
N2 - This paper introduces Pandora, a parallel algorithm for computing dendrograms, the hierarchical cluster trees for single linkage clustering (SLC). Current parallel approaches construct dendrograms by partitioning a minimum spanning tree and removing edges. However, they struggle with skewed, hard-to-parallelize real-world dendrograms. Consequently, computing dendrograms is the sequential bottleneck in HDBSCAN*[21], a popular SLC variant. Pandora uses recursive tree contraction to address this limitation. Pandora contracts nodes to construct progressively smaller trees. It computes the smallest contracted dendrogram and expands it by inserting contracted edges. This recursive strategy is highly parallel, skew-independent, work-optimal, and well-suited for GPUs and multicores. We develop a performance portable implementation of Pandora in Kokkos[31] and evaluate its performance on multicore CPUs and multi-vendor GPUs (e.g., Nvidia, AMD) for dendrogram construction in HDBSCAN*. Multithreaded Pandora is 2.2x faster than the current best-multithreaded implementation. Our GPU version achieves 6-20x speedup on AMD GPUs and 10-37x on NVIDIA GPUs over multithreaded Pandora. Pandora removes HDBSCAN*'s sequential bottleneck, greatly boosting efficiency, particularly with GPUs.
AB - This paper introduces Pandora, a parallel algorithm for computing dendrograms, the hierarchical cluster trees for single linkage clustering (SLC). Current parallel approaches construct dendrograms by partitioning a minimum spanning tree and removing edges. However, they struggle with skewed, hard-to-parallelize real-world dendrograms. Consequently, computing dendrograms is the sequential bottleneck in HDBSCAN*[21], a popular SLC variant. Pandora uses recursive tree contraction to address this limitation. Pandora contracts nodes to construct progressively smaller trees. It computes the smallest contracted dendrogram and expands it by inserting contracted edges. This recursive strategy is highly parallel, skew-independent, work-optimal, and well-suited for GPUs and multicores. We develop a performance portable implementation of Pandora in Kokkos[31] and evaluate its performance on multicore CPUs and multi-vendor GPUs (e.g., Nvidia, AMD) for dendrogram construction in HDBSCAN*. Multithreaded Pandora is 2.2x faster than the current best-multithreaded implementation. Our GPU version achieves 6-20x speedup on AMD GPUs and 10-37x on NVIDIA GPUs over multithreaded Pandora. Pandora removes HDBSCAN*'s sequential bottleneck, greatly boosting efficiency, particularly with GPUs.
UR - http://www.scopus.com/inward/record.url?scp=85202435752&partnerID=8YFLogxK
U2 - 10.1145/3673038.3673148
DO - 10.1145/3673038.3673148
M3 - Conference contribution
AN - SCOPUS:85202435752
T3 - ACM International Conference Proceeding Series
SP - 908
EP - 918
BT - 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PB - Association for Computing Machinery
Y2 - 12 August 2024 through 15 August 2024
ER -