A distributed CPU-GPU sparse direct solver

Piyush Sao, Richard Vuduc, Xiaoye Sherry Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

34 Scopus citations

Abstract

This paper presents the first hybrid MPI+OpenMP+CUDA implementation of a distributed memory right-looking unsymmetric sparse direct solver (i.e., sparse LU factorization) that uses static pivoting. While BLAS calls can account for more than 40% of the overall factorization time, the difficulty is that small problem sizes dominate the workload, making efficient GPU utilization challenging. This fact motivates our approach, which is to find ways to aggregate collections of small BLAS operations into larger ones; to schedule operations to achieve load balance and hide long-latency operations, such as PCIe transfer; and to exploit simultaneously all of a node's available CPU cores and GPUs.

Original languageEnglish
Title of host publicationEuro-Par 2014
Subtitle of host publicationParallel Processing - 20th International Conference, Proceedings
EditorsFernando Silva, Inês Dutra, Vítor Santos Costa
PublisherSpringer Verlag
Pages487-498
Number of pages12
ISBN (Electronic)9783319098722
ISBN (Print)9783319098722
DOIs
StatePublished - 2014
Externally publishedYes
Event20th International Conference on Parallel Processing, Euro-Par 2014 - Porto, Portugal
Duration: Aug 25 2014Aug 29 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8632 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Parallel Processing, Euro-Par 2014
Country/TerritoryPortugal
CityPorto
Period08/25/1408/29/14

Fingerprint

Dive into the research topics of 'A distributed CPU-GPU sparse direct solver'. Together they form a unique fingerprint.

Cite this