Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs

Hartwig Anzt, Moritz Kreutzer, Eduardo Ponce, Gregory D. Peterson, Gerhard Wellein, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

In this paper, we present an optimized GPU implementation for the induced dimension reduction algorithm. We improve data locality, combine it with an efficient sparse matrix vector kernel, and investigate the potential of overlapping computation with communication as well as the possibility of concurrent kernel execution. A comprehensive performance evaluation is conducted using a suitable performance model. The analysis reveals efficiency of up to 90%, which indicates that the implementation achieves performance close to the theoretically attainable bound.

Original languageEnglish
Pages (from-to)220-230
Number of pages11
JournalInternational Journal of High Performance Computing Applications
Volume32
Issue number2
DOIs
StatePublished - Mar 1 2018

Funding

This material is based upon work supported in part by the US Department of Energy (grant number DE-SC-0010042), the German Research Foundation (DFG) through the Priority Program 1648 (SPPEXA) under project ESSEX, the Air Force Office of Scientific Research (grant number FA9550-12-1-0476), and NVIDIA.

Keywords

  • GPU
  • Induced dimension reduction (IDR)
  • co-design
  • kernel fusion
  • kernel overlap
  • roofline performance model

Fingerprint

Dive into the research topics of 'Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs'. Together they form a unique fingerprint.

Cite this