Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization

Lawton Manning, Grey Ballard, Ramakrishnan Kannan, Haesun Park

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Nonnegative Matrix Factorization (NMF) is an effective tool for clustering nonnegative data, either for computing a flat partitioning of a dataset or for determining a hierarchy of similarity. In this paper, we propose a parallel algorithm for hierarchical clustering that uses a divide-and-conquer approach based on rank-two NMF to split a data set into two cohesive parts. Not only does this approach uncover more structure in the data than a flat NMF clustering, but also rank-two NMF can be computed more quickly than for general ranks, providing comparable overall time to solution. Our data distribution and parallelization strategies are designed to maintain computational load balance throughout the data-dependent hierarchy of computation while limiting interprocess communication, allowing the algorithm to scale to large dense and sparse data sets. We demonstrate the scalability of our parallel algorithm in terms of data size (up to 800 GB) and number of processors (up to 80 nodes of the Summit supercomputer), applying the hierarchical clustering approach to hyperspectral imaging and image classification data. Our algorithm for Rank-2 NMF scales perfectly on up to 1000s of cores and the entire hierarchical clustering method achieves 5.9x speedup scaling from 10 to 80 nodes on the 800 GB dataset.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages141-150
Number of pages10
ISBN (Electronic)9780738110356
DOIs
StatePublished - Dec 2020
Event27th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2020 - Virtual, Pune, India
Duration: Dec 16 2020Dec 18 2020

Publication series

NameProceedings - 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics, HiPC 2020

Conference

Conference27th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2020
Country/TerritoryIndia
CityVirtual, Pune
Period12/16/2012/18/20

Funding

ACKNOWLEDGMENT This material is based upon work supported by the National Science Foundation under Grant No. OAC-1642385 and OAC-1642410. This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. This project was partially funded by the Laboratory Director’s Research and Development fund. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy.

Keywords

  • distributed-memory parallel algorithms
  • low-rank approximation
  • scalable clustering

Fingerprint

Dive into the research topics of 'Parallel Hierarchical Clustering using Rank-Two Nonnegative Matrix Factorization'. Together they form a unique fingerprint.

Cite this