Performance portability of a GPU enabled factorization with the DAGuE framework

George Bosilca, Aurelien Bouteiller, Thomas Herault, Pierre Lemarinier, Narapat Ohm Saengpatsa, Stanimire Tomov, Jack J. Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Scopus citations

Abstract

Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the GPU subsystem of the DAGuE runtime, and assess, on the Cholesky factorization test case, the minimal efforts required by a programmer to enable GPU acceleration in the DAGuE framework. The performance achieved by this unchanged code, on a variety of heterogeneous and distributed many cores and GPU resources, demonstrates the desired performance portability.

Original languageEnglish
Title of host publicationProceedings - 2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Pages395-402
Number of pages8
DOIs
StatePublished - 2011
Event2011 IEEE International Conference on Cluster Computing, CLUSTER 2011 - Austin, TX, United States
Duration: Sep 26 2011Sep 30 2011

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference2011 IEEE International Conference on Cluster Computing, CLUSTER 2011
Country/TerritoryUnited States
CityAustin, TX
Period09/26/1109/30/11

Keywords

  • DAG scheduling
  • GPU
  • cluster
  • linear algebra

Fingerprint

Dive into the research topics of 'Performance portability of a GPU enabled factorization with the DAGuE framework'. Together they form a unique fingerprint.

Cite this