Abstract
Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.
Original language | English |
---|---|
Title of host publication | ACM ICS 2023 - Proceedings of the International Conference on Supercomputing |
Publisher | Association for Computing Machinery |
Pages | 301-312 |
Number of pages | 12 |
ISBN (Electronic) | 9798400700569 |
DOIs | |
State | Published - Jun 21 2023 |
Event | 37th ACM International Conference on Supercomputing, ICS 2023 - Orlando, United States Duration: Jun 21 2023 → Jun 23 2023 |
Publication series
Name | Proceedings of the International Conference on Supercomputing |
---|
Conference
Conference | 37th ACM International Conference on Supercomputing, ICS 2023 |
---|---|
Country/Territory | United States |
City | Orlando |
Period | 06/21/23 → 06/23/23 |
Funding
This material is based upon work supported by the National Science Foundation (NSF) under Grant Nos. OAC-2106920 and CCF-1942892. It is also based upon work supported by the U.S. Department of Energy, Office of Science under Contract DE-AC02-06CH11357 at Argonne National Laboratory and UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 at Oak Ridge National Laboratory. Additionally, it is based upon work supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research program under Award Number DE-SC-0023296. Koby Hayashi acknowledges support from the United States Department of Energy through the Computational Sciences Graduate Fellowship (DOE CSGF) under grant number: DE-SC0020347. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DOE or NSF.
Keywords
- high performance computing
- multimodal inputs
- nonnegative matrix factorization