Abstract
In this paper we unveil some performance and energy efficiency frontiers for sparse computations on GPU-based supercomputers. We compare the resource efficiency of different sparse matrix-vector products (SpMV) taken from libraries such as cuSPARSE and MAGMA for GPU and Intel's MKL for multicore CPUs, and develop a GPU sparse matrix-matrix product (SpMM) implementation that handles the simultaneous multiplication of a sparse matrix with a set of vectors in block-wise fashion. While a typical sparse computation such as the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM succeeds in exceeding the memory-bound limitations of the SpMV. We integrate this kernel into a GPU-accelerated Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) eigensolver. LOBPCG is chosen as a benchmark algorithm for this study as it combines an interesting mix of sparse and dense linear algebra operations that is typical for complex simulation applications, and allows for hardware-aware optimizations. In a detailed analysis we compare the performance and energy efficiency against a multi-threaded CPU counterpart. The reported performance and energy efficiency results are indicative of sparse computations on supercomputers.
Original language | English |
---|---|
Pages (from-to) | 375-390 |
Number of pages | 16 |
Journal | International Journal of High Performance Computing Applications |
Volume | 31 |
Issue number | 5 |
DOIs | |
State | Published - Sep 1 2017 |
Bibliographical note
Publisher Copyright:© 2016 The Author(s).
Keywords
- GPU supercomputer
- LOBPCG
- blocked sparse matrix-vector product
- energy efficiency
- sparse eigensolver