On the performance and energy efficiency of sparse linear algebra on GPUs

Hartwig Anzt, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix-vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel's MKL for multicore CPUs, and develop a GPU sparse matrix-matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.

Original languageEnglish
Pages (from-to)375-390
Number of pages16
JournalInternational Journal of High Performance Computing Applications
Volume31
Issue number5
DOIs
StatePublished - Sep 1 2017

Bibliographical note

Publisher Copyright:
© 2016 The Author(s).

Keywords

  • GPU supercomputer
  • LOBPCG
  • blocked sparse matrix-vector product
  • energy efficiency
  • sparse eigensolver

Fingerprint

Dive into the research topics of 'On the performance and energy efficiency of sparse linear algebra on GPUs'. Together they form a unique fingerprint.

Cite this