Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product

Hartwig Anzt, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalConference articlepeer-review

27 Scopus citations

Abstract

This paper presents a heterogeneous CPU-GPU implementation for a sparse iterative eigensolver - the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG). For the key routine generating the Krylov search spaces via the product of a sparse matrix and a block of vectors, we propose a GPU kernel based on a modified sliced ELLPACK format. Blocking a set of vectors and processing them simultaneously accelerates the computation of a set of consecutive SpMVs significantly. Comparing the performance against similar routines from Intel's MKL and NVIDIA's cuSPARSE library we identify appealing performance improvements. We integrate it into the highly optimized LOBPCG implementation. Compared to the BLOBEX CPU implementation running on two eight-core Intel Xeon E5-2690s, we accelerate the computation of a small set of eigenvectors using NVIDIA "s K40 GPU by typically more than an order of magnitude.

Original languageEnglish
Pages (from-to)75-82
Number of pages8
JournalSimulation Series
Volume47
Issue number4
StatePublished - 2015
Externally publishedYes
Event23rd High Performance Computing Symposium, HPC 2015, Part of the 2015 Spring Simulation Multi-Conference, SpringSim 2015 - Alexandria, United States
Duration: Apr 12 2015Apr 15 2015

Funding

FundersFunder number
National Science FoundationACI-1339S22

    Keywords

    • GPU acceleration
    • LOBPCG eigensolver
    • SpMM
    • SpMV

    Fingerprint

    Dive into the research topics of 'Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product'. Together they form a unique fingerprint.

    Cite this