Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators

Dalal Sukkari, Mark Gates, Mohammed Al Farhan, Hartwig Anzt, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We investigate a new task-based implementation of the polar decomposition on massively parallel systems augmented with multiple GPUs using SLATE. We implement the iterative QR Dynamically-Weighted Halley (QDWH) algorithm, whose building blocks mainly consist of compute-bound matrix operations, allowing for high levels of parallelism to be exploited on various hardware architectures, such as NVIDIA, AMD, and Intel GPU-based systems. To achieve both performance and portability, we implement our QDWH-based polar decomposition in the SLATE library, which uses efficient techniques in dense linear algebra, such as 2D block cyclic data distribution and communication-avoiding algorithms, as well as modern parallel programming approaches, such as dynamic scheduling and communication overlapping, and uses OpenMP tasks to track data dependencies. We report numerical accuracy and performance results. The benchmarking campaign reveals up to an 18-fold performance speedup of the GPU accelerated implementation compared to the existing state-of-the-art implementation for the polar decomposition.

Original languageEnglish
Title of host publicationProceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PublisherAssociation for Computing Machinery
Pages1680-1687
Number of pages8
ISBN (Electronic)9798400707858
DOIs
StatePublished - Nov 12 2023
Event2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States
Duration: Nov 12 2023Nov 17 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Country/TerritoryUnited States
CityDenver
Period11/12/2311/17/23

Funding

This research was supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
Office of Science and National Nuclear Security Administration
U.S. Department of Energy
Office of ScienceDE-AC05-00OR22725

    Keywords

    • Linear algebra
    • QDWH
    • polar decomposition

    Fingerprint

    Dive into the research topics of 'Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators'. Together they form a unique fingerprint.

    Cite this