A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems

Piyush Sao, Xing Liu, Richard Vuduc, Xiaoye Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

This paper presents the first sparse direct solver for distributed memory systems comprising hybrid multicourse CPU and Intel Xeon Pico-processors. It builds on the algorithmic approach of SuperLU-DIST, which is right-looking and statically pivoted. Our contribution is a novel algorithm, called the HALO. The name is shorthand for highly asynchronous lazy offload, it refers tithe way the algorithm combines highly aggressive use of asynchrony with accelerated offload, lazy updates, and data shadowing (a la halo or ghost zones), all of which serve to hide and reduce communication, whether to local memory, across the network, or over PCIe. We further augment HALO with a model-driven autotuning heuristicthat chooses the intra-node division of labor among CPU and Xeon Pico-processor components. When integrated into SuperLU-DIST and evaluated on a variety of realistic test problems in both single-node and multi-node configurations, the resulting implementation achieves speedups of unto 2.5× over an already efficient multicourse CPU implementation, and achieves up to 83% of a machine-specific upper-bound that we haveestimated. Our analysis quantifies how well our implementation performs and allows us to speculate on the potential speedups that might come from variety of future improvements to the algorithm and system.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages71-81
Number of pages11
ISBN (Electronic)9781479986484
DOIs
StatePublished - Jul 17 2015
Externally publishedYes
Event29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015 - Hyderabad, India
Duration: May 25 2015May 29 2015

Publication series

NameProceedings - 2015 IEEE 29th International Parallel and Distributed Processing Symposium, IPDPS 2015

Conference

Conference29th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2015
Country/TerritoryIndia
CityHyderabad
Period05/25/1505/29/15

Funding

This work was supported in part by the National Science Foundation (NSF) under NSF CAREER award number 0953100 and NSF SI2-SSI Award 1339745

Keywords

  • Communication-avoiding algorithm
  • GPU
  • Heterogeneous computing
  • MPI
  • OpenMP
  • Sparse Direct Solver
  • Xeon-Phi acceleration

Fingerprint

Dive into the research topics of 'A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems'. Together they form a unique fingerprint.

Cite this