Efficient quality threshold clustering for parallel architectures

Anthony Danalis, Collin McCurdy, Jeffrey S. Vetter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Quality Threshold Clustering (QTC) is an algorithm for partitioning data, in fields such as biology, where clustering of large data-sets can aid scientific discovery. Unlike other clustering algorithms, QTC does not require knowing the number of clusters a priori, however, its perceived need for high computing power often makes it an unattractive choice. This paper presents a thorough study of QTC. We analyze the worst case complexity of the algorithm and discuss methods to reduce it by trading memory for computation. We also demonstrate how the expected running time of QTC is affected by the structure of the input data. We describe how QTC can be parallelized, and discuss implementation details of our thread-parallel, GPU, and distributed memory implementations of the algorithm. We demonstrate the efficiency of our implementations through experimental data. We show how data sets with tens of thousands of elements can be clustered in a matter of minutes in a modern GPU, and seconds in a small scale cluster of multi-core CPUs, or multiple GPUs. Finally, we discuss how user selected parameters, as well as algorithmic and implementation choices, affect performance.

Original languageEnglish
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Pages1068-1079
Number of pages12
DOIs
StatePublished - 2012
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 - Shanghai, China
Duration: May 21 2012May 25 2012

Publication series

NameProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

Conference

Conference2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Country/TerritoryChina
CityShanghai
Period05/21/1205/25/12

Keywords

  • GPU
  • QT-clustering
  • complexity
  • distributed
  • multi-core

Fingerprint

Dive into the research topics of 'Efficient quality threshold clustering for parallel architectures'. Together they form a unique fingerprint.

Cite this