Asynchronous iterative algorithm for computing incomplete factorizations on GPUs

Edmond Chow, Hartwig Anzt, Jack Dongarra

Research output: Contribution to journalConference articlepeer-review

35 Scopus citations

Abstract

This paper presents a GPU implementation of an asynchronous iterative algorithm for computing incomplete factorizations. Asynchronous algorithms, with their ability to tolerate memory latency, form an important class of algorithms for modern computer architectures. Our GPU implementation considers several non-traditional techniques that can be important for asynchronous algorithms to optimize convergence and data locality. These techniques include controlling the order in which variables are updated by controlling the order of execution of thread blocks, taking advantage of cache reuse between thread blocks, and managing the amount of parallelism to control the convergence of the algorithm.

Original languageEnglish
Article numberA1
Pages (from-to)1-16
Number of pages16
JournalLecture Notes in Computer Science
Volume9137 LNCS
DOIs
StatePublished - 2015
Externally publishedYes
Event30th International Conference on High Performance Computing, ISC 2015 - Frankfurt, Germany
Duration: Jul 12 2015Jul 16 2015

Funding

This material is based upon work supported by the U.S. Department of Energy Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC-0012538 and DE-SC-0010042. Support from NVIDIA is also acknowledged.

FundersFunder number
U.S. Department of Energy
Advanced Scientific Computing ResearchDE-SC-0010042, DE-SC-0012538

    Fingerprint

    Dive into the research topics of 'Asynchronous iterative algorithm for computing incomplete factorizations on GPUs'. Together they form a unique fingerprint.

    Cite this