Automatic blocking of QR and LU factorizations for locality

Qing Yi, Ken Kennedy, Haihang You, Keith Seymour, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, more benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.

Original languageEnglish
Title of host publicationProceedings of the ACM SIGPLAN Workshop on Memory System Performance, MSP 2004
Pages12-22
Number of pages11
DOIs
StatePublished - 2004
Externally publishedYes
Event2nd ACM SIGPLAN Workshop on Memory Systems Performance, MSP 2004 - Washington, DC, United States
Duration: Jun 8 2004Jun 8 2004

Publication series

NameProceedings of the ACM SIGPLAN Workshop on Memory System Performance, MSP 2004

Conference

Conference2nd ACM SIGPLAN Workshop on Memory Systems Performance, MSP 2004
Country/TerritoryUnited States
CityWashington, DC
Period06/8/0406/8/04

Keywords

  • Blocking
  • LAPACK
  • LU
  • Locality
  • Loop optimizations
  • QR

Fingerprint

Dive into the research topics of 'Automatic blocking of QR and LU factorizations for locality'. Together they form a unique fingerprint.

Cite this