Mixed-precision block gram schmidt orthogonalization

Ichitaro Yamazaki, Stanimire Tomov, Jakub Kurzak, Jack Dongarra, Jesse Barlow

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the softwareemulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the softwareemulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixedprecision CholQR, such a block variant can obtain speedups of up to 7:1× while maintaining about the same order of the numerical errors.

Original languageEnglish
Title of host publicationProceedings of ScalA 2015
Subtitle of host publication6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450340113
DOIs
StatePublished - Nov 15 2015
Externally publishedYes
Event6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2015 - Austin, United States
Duration: Nov 15 2015Nov 20 2015

Publication series

NameProceedings of ScalA 2015: 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2015
Country/TerritoryUnited States
CityAustin
Period11/15/1511/20/15

Funding

FundersFunder number
Directorate for Computer and Information Science and Engineering1339822

    Keywords

    • GPU Computation
    • Mixed precision
    • Orthogonalization

    Fingerprint

    Dive into the research topics of 'Mixed-precision block gram schmidt orthogonalization'. Together they form a unique fingerprint.

    Cite this