Stability and performance of various singular value QR implementations on multicore CPU with a GPU

Ichitaro Yamazaki, Stanimire Tomov, Jack Dongarra

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Singular Value QR (SVQR) can orthonormalize a set of dense vectors with the minimum communication (one global reduction between the parallel processing units, and BLAS-3 to performmost of its local computation). As a result, compared to other orthogonalization schemes, SVQR obtains superior performance on many of the current computers, where the communication has become significantly more expensive compared to the arithmetic operations. In this article, we study the stability and performance of various SVQR implementations on multicore CPUs with a GPU. Our focus is on the dense triangular solve, which performs half of the total floating-point operations of SVQR. As a part of this study, we examine an adaptive mixed-precision variant of SVQR, which decides if a lower-precision arithmetic can be used for the triangular solution at runtime without increasing the order of its orthogonality error (though its backward error is significantly greater). If the greater backward error can be tolerated, then our performance results with an NVIDIA Kepler GPU show that the mixed-precision SVQR can obtain a speedup of up to 1.36 over the standard SVQR.

Original languageEnglish
Article numbera10
JournalACM Transactions on Mathematical Software
Volume43
Issue number2
DOIs
StatePublished - Sep 2016
Externally publishedYes

Keywords

  • GPU computation
  • Mixed precision
  • Orthogonalization

Fingerprint

Dive into the research topics of 'Stability and performance of various singular value QR implementations on multicore CPU with a GPU'. Together they form a unique fingerprint.

Cite this