Energy efficiency and performance frontiers for sparse computations on GPU supercomputers

Hartwig Anzt, Stanimire Tomov, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

In this paper we unveil some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. To do this, we consider state-of-the-art implementations of the sparse matrix-vector (SpMV) product in libraries like cuSPARSE, MKL, and MAGMA, and their use in the LOBPCG eigen-solver. LOBPCG is chosen as a benchmark for this study as it combines an interesting mix of sparse and dense linear algebra operations with potential for hardware-aware optimizations. Most notably, LOBPCG includes a blocking technique that is a common performance optimization for many applications. In particular, multiple memory-bound SpMV operations are blocked into a SpM-matrix product (SpMM), that achieves significantly higher performance than a sequence of SpMVs. We provide details about the GPU kernels we use for the SpMV, SpMM, and the LOBPCG implementation design, and study performance and energy consumption compared to CPU solutions. While a typical sparse computation like the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM achieves up to a 6× performance improvement over the GPU's SpMV, and the GPU-accelerated LOBPCG based on this kernel is 3 to 5× faster than multicore CPUs with the same power draw, e.g., a K40 GPU vs. two Sandy Bridge CPUs (16 cores). In practice though, we show that currently available CPU implementations are much slower due to missed optimization opportunities. These performance results translate to similar improvements in energy consumption, and are indicative of today's frontiers in energy efficiency and performance for sparse computations on supercomputers.

Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015
EditorsPavan Balaji, Minyi Guo, Zhiyi Huang
PublisherAssociation for Computing Machinery
Pages1-10
Number of pages10
ISBN (Electronic)9781450334044
DOIs
StatePublished - Feb 7 2015
Event6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015 - San Francisco Bay Area, United States
Duration: Feb 7 2015Feb 8 2015

Publication series

NameProceedings of the 6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015

Conference

Conference6th International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2015
Country/TerritoryUnited States
CitySan Francisco Bay Area
Period02/7/1502/8/15

Keywords

  • Blocked sparse matrix vector product
  • Energy efficiency
  • GPU supercomputer
  • LOBPCG
  • Sparse eigensolver

Fingerprint

Dive into the research topics of 'Energy efficiency and performance frontiers for sparse computations on GPU supercomputers'. Together they form a unique fingerprint.

Cite this