TY - GEN
T1 - A high-performance parallel algorithm for nonnegative matrix factorization
AU - Kannan, Ramakrishnan
AU - Ballard, Grey
AU - Park, Haesun
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/2/27
Y1 - 2016/2/27
N2 - Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.
AB - Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A ≈ WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementations, our algorithm is also flexible: (1) it performs well for both dense and sparse matrices, and (2) it allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors W and H within the alternating iterations. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements.
UR - http://www.scopus.com/inward/record.url?scp=84963759259&partnerID=8YFLogxK
U2 - 10.1145/2851141.2851152
DO - 10.1145/2851141.2851152
M3 - Conference contribution
AN - SCOPUS:84963759259
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
BT - 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016 - Proceedings
PB - Association for Computing Machinery
T2 - 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016
Y2 - 12 March 2016 through 16 March 2016
ER -