TY - GEN
T1 - Parallel Nonnegative CP Decomposition of Dense Tensors
AU - Ballard, Grey
AU - Hayashi, Koby
AU - Kannan, Ramakrishnan
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensors that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software.
AB - The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensors that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the Matricized-Tensor Times Khatri-Rao Product (MTTKRP) bottleneck subcomputation. The algorithm is computation efficient, using dimension trees to avoid redundant computation across MTTKRPs within the alternating method. Our approach is also communication efficient, using a data distribution and parallel algorithm across a multidimensional processor grid that can be tuned to minimize communication. We benchmark our software on synthetic as well as hyperspectral image and neuroscience dynamic functional connectivity data, demonstrating that our algorithm scales well to 100s of nodes (up to 4096 cores) and is faster and more general than the currently available parallel software.
KW - CP Decomposition
KW - Lowrank Approximation
KW - MTTKRP
KW - Tensor
UR - http://www.scopus.com/inward/record.url?scp=85062823352&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2018.00012
DO - 10.1109/HiPC.2018.00012
M3 - Conference contribution
AN - SCOPUS:85062823352
T3 - Proceedings - 25th IEEE International Conference on High Performance Computing, HiPC 2018
SP - 22
EP - 31
BT - Proceedings - 25th IEEE International Conference on High Performance Computing, HiPC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on High Performance Computing, HiPC 2018
Y2 - 17 December 2018 through 20 December 2018
ER -