Optimizing memory-bound SYMV kernel on GPU hardware accelerators

Ahmad Abdelfattah, Jack Dongarra, David Keyes, Hatem Ltaief

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical kernel for computing the symmetric matrix-vector product on nVidia Fermi GPUs. Due to its inherent memory-bound nature, this kernel is very critical in the tridiagonalization of a symmetric dense matrix, which is a preprocessing step to calculate the eigenpairs. Using a novel design to address the irregular memory accesses by hiding latency and increasing bandwidth, our preliminary asymptotic results show 3.5x and 2.5x fold speedups over the similar CUBLAS 4.0 kernel, and 7-8% and 30% fold improvement over the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library in single and double precision arithmetics, respectively.

Original languageEnglish
Title of host publicationHigh Performance Computing for Computational Science, VECPAR 2012 - 10th International Conference, Revised Selected Papers
Pages72-79
Number of pages8
DOIs
StatePublished - 2013
Externally publishedYes
Event10th International Conference on High Performance Computing for Computational Science, VECPAR 2012 - Kobe, Japan
Duration: Jul 17 2012Jul 20 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7851 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on High Performance Computing for Computational Science, VECPAR 2012
Country/TerritoryJapan
CityKobe
Period07/17/1207/20/12

Fingerprint

Dive into the research topics of 'Optimizing memory-bound SYMV kernel on GPU hardware accelerators'. Together they form a unique fingerprint.

Cite this