Skip to main navigation Skip to search Skip to main content

Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation

  • Jan Laukemann
  • , Ahmed E. Helal
  • , S. Isaac Geronimo Anderson
  • , Fabio Checconi
  • , Yongseok Soh
  • , Jesmin Jahan Tithi
  • , Teresa Ranadive
  • , Brian J. Gravelle
  • , Fabrizio Petrini
  • , Jee Choi

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

High-dimensional sparse data emerge in many critical application domains such as healthcare and cybersecurity. To extract meaningful insights from massive volumes of these multi-dimensional data, scientists employ unsupervised analysis tools based on tensor decomposition (TD) methods. However, real-world sparse tensors exhibit highly irregular shapes and data distributions, which pose significant challenges for making efficient use of modern parallel processors. This study breaks the prevailing assumption that compressing sparse tensors into coarse-grained structures (i.e., tensor slices or blocks) or along a particular dimension/mode (i.e., mode-specific) is more efficient than keeping them in a fine-grained, mode-agnostic form. Our novel sparse tensor representation, Adaptive Linearized Tensor Order ( ALTO), encodes tensors in a compact format that can be easily streamed from memory and is amenable to both caching and parallel execution. In contrast to existing compressed tensor formats, ALTO constructs one tensor copy that is agnostic to both the mode orientation and the irregular distribution of nonzero elements. To demonstrate the efficacy of ALTO , we accelerate popular TD methods that compute the Canonical Polyadic Decomposition (CPD) model across different types of sparse tensors. We propose a set of parallel TD algorithms that exploit the inherent data reuse of tensor computations to substantially reduce synchronization overhead, decrease memory footprint, and improve parallel performance. Additionally, we characterize the major execution bottlenecks of TD methods on multiple generations of the latest Intel Xeon Scalable processors, including Sapphire Rapids CPUs, and introduce dynamic adaptation heuristics to automatically select the best algorithm based on the sparse tensor characteristics. Across a diverse set of real-world data sets, ALTO outperforms the state-of-the-art approaches, achieving more than an order-of-magnitude speedup over the best mode-agnostic formats. Compared to the best mode-specific formats, which require multiple tensor copies, ALTO achieves 5.1× geometric mean speedup at a fraction (25% ) of their storage costs. Moreover, ALTO obtains 8.4 × geometric mean speedup over the state-of-the-art memoization approach, which reduces computations by using extra memory, while requiring 14% of its memory consumption.

Original languageEnglish
Pages (from-to)1025-1041
Number of pages17
JournalIEEE Transactions on Parallel and Distributed Systems
Volume36
Issue number5
DOIs
StatePublished - 2025

Keywords

  • alternating poisson regression
  • ALTO
  • CP-APR
  • MTTKRP
  • multi-core CPU
  • poisson tensor decomposition
  • sapphire rapids
  • Sparse tensors
  • tensor decomposition

Fingerprint

Dive into the research topics of 'Accelerating Sparse Tensor Decomposition Using Adaptive Linearized Representation'. Together they form a unique fingerprint.

Cite this