Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices

Tingxing Dong, Azzam Haidar, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well.

Original languageEnglish
Pages (from-to)1008-1018
Number of pages11
JournalProcedia Computer Science
Volume108
DOIs
StatePublished - 2017
EventInternational Conference on Computational Science ICCS 2017 - Zurich, Switzerland
Duration: Jun 12 2017Jun 14 2017

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The work was also partially supported by Nvidia and NSF under grant #1514406.

Keywords

  • Hardware accelerators
  • Singular Value Problems
  • batched
  • two-sided factorization algorithms

Fingerprint

Dive into the research topics of 'Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices'. Together they form a unique fingerprint.

Cite this