Abstract
We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). We describe our parallel distributions and algorithms, which are careful to avoid unnecessary communication and computation, show how to extend the software to include new algorithms and/or constraints, and report efficiency and scalability results for both synthetic and real-world data sets.
Original language | English |
---|---|
Article number | 20 |
Journal | ACM Transactions on Mathematical Software |
Volume | 47 |
Issue number | 3 |
DOIs | |
State | Published - Jun 2021 |
Funding
Eswar and Hayashi share first authorship. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). This project was partially funded by the Laboratory Director’s Research and Development fund. This material is based upon work supported by the National Science Foundation (NSF) under Grants No. OAC-1642385 and No. OAC-1642410. Koby Hayashi acknowledges support from the United States Department of Energy through the Computational Sciences Graduate Fellowship (DOE CSGF) under Grant No. DE-SC0020347. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the DOE Office of Science (SC). This research used resources of the National Energy Research Scientific Computing Center, a SC User Facility supported by the SC under Contract No. DE-AC02-05CH11231. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF or DOE. Authors’ addresses: S. Eswar, K. Hayashi, and H. Park, Dept. of CSE, Georgia Institute of Technology, Atlanta, GA 30308; emails: {seswar3, khayashi9, hpark}@gatech.edu; G. Ballard, Dept. of CS, Wake Forest University, Winston-Salem, NC 27109; email: [email protected]; R. Kannan and M. A. Matheson, Oak Ridge National Laboratory, Oak Ridge, TN 37831; emails: {kannanr, mathesonma}@ornl.gov. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only. © 2021 Association for Computing Machinery. 0098-3500/2021/06-ART20 $15.00 https://doi.org/10.1145/3432185
Keywords
- Tensor factorization
- communication-avoiding algorithms
- nonnegative least squares