FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world

Graham E. Fagg, Jack J. Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

218 Scopus citations

Abstract

Initial versions of MPI were designed to work efficiently on multiprocessors which had very little job control and thus static process models, subsequently forcing them to support dynamic process operations would have effected their performance. As current HPC systems increase in size with higher potential levels of individual node failure, the need rises for new fault tolerant systems to be developed. Here we present a new implementation of MPI called FT-MPI1 that allows the semantics and associated failure modes to be completely controlled by the application. Given is an overview of the FT-MPI semantics, design and some performance issues as well as the HARNESS g_hcore implementation it is built upon.

Original languageEnglish
Title of host publicationRecent Advances in Parallel Virtual Machine and Message Passing Interface - 7th European PVM/MPI Users’ Group Meeting, Proceedings
EditorsJack Dongarra, Peter Kacsuk, Norbert Podhorszki
PublisherSpringer Verlag
Pages346-353
Number of pages8
ISBN (Print)3540410104, 9783540410102
DOIs
StatePublished - 2000
Externally publishedYes
Event7th European Parallel Virtual Machine and Message Passing Interface Users’ Group Meeting, PVM/MPI 2000 - Balatonfured, Hungary
Duration: Sep 10 2000Sep 13 2000

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1908
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th European Parallel Virtual Machine and Message Passing Interface Users’ Group Meeting, PVM/MPI 2000
Country/TerritoryHungary
CityBalatonfured
Period09/10/0009/13/00

Fingerprint

Dive into the research topics of 'FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world'. Together they form a unique fingerprint.

Cite this