Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures

Yongseok Soh, Ramakrishnan Kannan, Piyush Sao, Jee Choi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study presents the first constrained sparse tensor factorization (cSTF) framework that optimizes and fully offloads computation to massively parallel GPU architectures, and the first performance characterization of cSTF on GPU architectures. In contrast to prior work on tensor factorization, where the matricized tensor times Khatri-Rao product (MTTKRP) is the primary performance bottleneck, our systematic analysis of the cSTF algorithm on GPUs reveals that adding constraints creates an additional bottleneck in the update operation for many real-world sparse tensors. While executing the update operation on the GPU brings significant speedup over its CPU counterpart, it remains a significant bottleneck. To further accelerate the update operation, we propose cuADMM, a new update algorithm that leverages algorithmic and code optimization strategies to minimize both computation and data movement on GPUs. As a result, our framework delivers significantly improved performance compared to prior state-of-the-art. On 10 real-world sparse tensors, our framework achieves geometric mean speedup of 5.1 × (max 41.59 ×) and 7.01 × (max 58.05 ×) on the NIVIDA A100 and H100 GPUs, respectively, over the state-of-the-art SPLATT library running on a 26-core Intel Ice Lake Xeon CPU.

Original languageEnglish
Title of host publication53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
Pages107-116
Number of pages10
ISBN (Electronic)9798400708428
DOIs
StatePublished - Aug 12 2024
Event53rd International Conference on Parallel Processing, ICPP 2024 - Gotland, Sweden
Duration: Aug 12 2024Aug 15 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference53rd International Conference on Parallel Processing, ICPP 2024
Country/TerritorySweden
CityGotland
Period08/12/2408/15/24

Keywords

  • accelerated sparse tensor Factorization
  • algorithm
  • constrained tensor factorization
  • high performance

Fingerprint

Dive into the research topics of 'Accelerated Constrained Sparse Tensor Factorization on Massively Parallel Architectures'. Together they form a unique fingerprint.

Cite this