Optimizing symmetric dense matrix-vector multiplication on GPUs

Rajib Nath, Stanimire Tomov, Tingxing Dong, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

47 Scopus citations

Abstract

GPUs are excellent accelerators for data parallel applications with regular data access patterns. It is challenging, however, to optimize computations with irregular data access patterns on GPUs. One such computation is the Symmetric Matrix Vector product (SYMV) for dense linear algebra. Optimizing the SYMV kernel is important because it forms the basis of fundamental algorithms such as linear solvers and eigen-value solvers on symmetric matrices. In this work, we present a new algorithm for optimizing the SYMV kernel on GPUs. Our optimized SYMV in single precision brings up to a 7× speed up compared to the (latest) CUBLAS 4.0 NVIDIA library on the GTX 280 GPU. Our SYMV kernel tuned for Fermi C2050 is 4.5× faster than CUBLAS 4.0 in single precision and 2× faster than CUBLAS 4.0 in double precision. Moreover, the techniques used and described in the paper are general enough to be of interest for developing high-performance GPU kernels beyond the particular case of SYMV.

Original languageEnglish
Title of host publicationProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11 - Seattle, WA, United States
Duration: Nov 12 2011Nov 18 2011

Publication series

NameProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
Country/TerritoryUnited States
CitySeattle, WA
Period11/12/1111/18/11

Keywords

  • Autotuning
  • GPU
  • Matrix-vector multiplication
  • Pointer redirecting
  • Recursive blocking
  • Symmetric matrix

Fingerprint

Dive into the research topics of 'Optimizing symmetric dense matrix-vector multiplication on GPUs'. Together they form a unique fingerprint.

Cite this