TY - GEN
T1 - Efficient quality threshold clustering for parallel architectures
AU - Danalis, Anthony
AU - McCurdy, Collin
AU - Vetter, Jeffrey S.
PY - 2012
Y1 - 2012
N2 - Quality Threshold Clustering (QTC) is an algorithm for partitioning data, in fields such as biology, where clustering of large data-sets can aid scientific discovery. Unlike other clustering algorithms, QTC does not require knowing the number of clusters a priori, however, its perceived need for high computing power often makes it an unattractive choice. This paper presents a thorough study of QTC. We analyze the worst case complexity of the algorithm and discuss methods to reduce it by trading memory for computation. We also demonstrate how the expected running time of QTC is affected by the structure of the input data. We describe how QTC can be parallelized, and discuss implementation details of our thread-parallel, GPU, and distributed memory implementations of the algorithm. We demonstrate the efficiency of our implementations through experimental data. We show how data sets with tens of thousands of elements can be clustered in a matter of minutes in a modern GPU, and seconds in a small scale cluster of multi-core CPUs, or multiple GPUs. Finally, we discuss how user selected parameters, as well as algorithmic and implementation choices, affect performance.
AB - Quality Threshold Clustering (QTC) is an algorithm for partitioning data, in fields such as biology, where clustering of large data-sets can aid scientific discovery. Unlike other clustering algorithms, QTC does not require knowing the number of clusters a priori, however, its perceived need for high computing power often makes it an unattractive choice. This paper presents a thorough study of QTC. We analyze the worst case complexity of the algorithm and discuss methods to reduce it by trading memory for computation. We also demonstrate how the expected running time of QTC is affected by the structure of the input data. We describe how QTC can be parallelized, and discuss implementation details of our thread-parallel, GPU, and distributed memory implementations of the algorithm. We demonstrate the efficiency of our implementations through experimental data. We show how data sets with tens of thousands of elements can be clustered in a matter of minutes in a modern GPU, and seconds in a small scale cluster of multi-core CPUs, or multiple GPUs. Finally, we discuss how user selected parameters, as well as algorithmic and implementation choices, affect performance.
KW - GPU
KW - QT-clustering
KW - complexity
KW - distributed
KW - multi-core
UR - http://www.scopus.com/inward/record.url?scp=84866853937&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2012.99
DO - 10.1109/IPDPS.2012.99
M3 - Conference contribution
AN - SCOPUS:84866853937
SN - 9780769546759
T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
SP - 1068
EP - 1079
BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Y2 - 21 May 2012 through 25 May 2012
ER -