DAGuE: A generic distributed DAG engine for high performance computing

George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault, Pierre Lemarinier, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

55 Scopus citations

Abstract

The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures has been a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be represented as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a Linear Algebra factorization as a use case.

Original languageEnglish
Title of host publication2011 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum, IPDPSW 2011
Pages1151-1158
Number of pages8
DOIs
StatePublished - 2011
Event25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 - Anchorage, AK, United States
Duration: May 16 2011May 20 2011

Publication series

NameIEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum

Conference

Conference25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011
Country/TerritoryUnited States
CityAnchorage, AK
Period05/16/1105/20/11

Fingerprint

Dive into the research topics of 'DAGuE: A generic distributed DAG engine for high performance computing'. Together they form a unique fingerprint.

Cite this