Evolving the message passing programming model via a fault-tolerant, object-oriented transport layer

  • Jeremiah J. Wilke
  • , Hemanth Kolla
  • , Keita Teranishi
  • , David S. Hollman
  • , Janine C. Bennett
  • , Nicole Slattengren

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this position paper, we argue for improved fault-tolerance of an MPI code by introducing lightweight virtualization into the MPI interface. In particular, we outline key-value store semantics for MPI send/recv calls, thereby creating a far more expressive programming model. The general message passing semantics and imperative style of MPI application codes would remain essentially unchanged. However, the additional expressiblity of the programming model 1) enables the underlying transport layer to handle faulttolerance more transparently to the application developer, and 2) provides an evolutionary code path towards more declarative asynchronous programming models. The core contribution of this paper is an initial implementation of the DHARMA transport layer that provides the new, required functionality to support the MPI key-value store model.

Original languageEnglish
Title of host publicationFTXS 2015 - Proceedings of the 2015 Workshop on Fault Tolerance for HPC at eXtreme Scale, Part of HPDC 2015
PublisherAssociation for Computing Machinery, Inc
Pages41-46
Number of pages6
ISBN (Electronic)9781450335690
DOIs
StatePublished - Jun 15 2015
Externally publishedYes
Event5th Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2015 - Portland, United States
Duration: Jun 15 2015 → …

Publication series

NameFTXS 2015 - Proceedings of the 2015 Workshop on Fault Tolerance for HPC at eXtreme Scale, Part of HPDC 2015

Conference

Conference5th Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2015
Country/TerritoryUnited States
CityPortland
Period06/15/15 → …

Funding

The authors would like to thank Craig Ulmer, Gary Templet, and Abhinav Vishnu for useful discussions. This work was supported by the U.S. Department of Energy (DOE) Na- tional Nuclear Security Administration (NNSA) Advanced Simulation and Computing (ASC) program. Sandia Na- tional Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned sub- sidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000

Fingerprint

Dive into the research topics of 'Evolving the message passing programming model via a fault-tolerant, object-oriented transport layer'. Together they form a unique fingerprint.

Cite this