On the performance and energy efficiency of sparse linear algebra on GPUs

Hartwig Anzt, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix-vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel's MKL for multicore CPUs, and develop a GPU sparse matrix-matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.

Original languageEnglish
Pages (from-to)375-390
Number of pages16
JournalInternational Journal of High Performance Computing Applications
Issue number5
StatePublished - Sep 1 2017


The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the National Science Foundation (Grant number ACI-1339822), Department of Energy (Grant number DE-SC0010042), and NVIDIA. The work was also funded in part by the Russian Scientific Foundation (agreement N14-11-00190).


  • GPU supercomputer
  • blocked sparse matrix-vector product
  • energy efficiency
  • sparse eigensolver


Dive into the research topics of 'On the performance and energy efficiency of sparse linear algebra on GPUs'. Together they form a unique fingerprint.

Cite this