Batch QR Factorization on GPUs: Design, Optimization, and Tuning

Ahmad Abdelfattah, Stan Tomov, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

QR factorization of dense matrices is a ubiquitous tool in high performance computing (HPC). From solving linear systems and least squares problems to eigenvalue problems, and singular value decompositions, the impact of a high performance QR factorization is fundamental to computer simulations and many applications. More importantly, the QR factorization on a batch of relatively small matrices has acquired a lot of attention in sparse direct solvers and low-rank approximations for Hierarchical matrices. To address this interest and demand, we developed and present a high performance batch QR factorization for Graphics Processing Units (GPUs). We present a multi-level blocking strategy that adjusts various algorithmic designs to the size of the input matrices. We also show that following the LAPACK QR design convention, while still useful, is significantly outperformed by unconventional code structures that increase data reuse. The performance results show multi-fold speedups against the state of the art libraries on the latest GPU architectures from both NVIDIA and AMD.

Original languageEnglish
Title of host publicationComputational Science - ICCS 2022, 22nd International Conference, Proceedings
EditorsDerek Groen, Clélia de Mulatier, Valeria V. Krzhizhanovskaya, Peter M.A. Sloot, Maciej Paszynski, Jack J. Dongarra
PublisherSpringer Science and Business Media Deutschland GmbH
Pages60-74
Number of pages15
ISBN (Print)9783031087509
DOIs
StatePublished - 2022
Event22nd Annual International Conference on Computational Science, ICCS 2022 - London, United Kingdom
Duration: Jun 21 2022Jun 23 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13350 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd Annual International Conference on Computational Science, ICCS 2022
Country/TerritoryUnited Kingdom
CityLondon
Period06/21/2206/23/22

Keywords

  • Batch linear algebra
  • GPU computing
  • QR factorization

Fingerprint

Dive into the research topics of 'Batch QR Factorization on GPUs: Design, Optimization, and Tuning'. Together they form a unique fingerprint.

Cite this