Retrospect: Deterministic replay of MPI applications for interactive distributed debugging

Aurelien Bouteiller, George Bosilca, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Scopus citations

Abstract

While high performance computing was eagerly adopted by users as a vehicle for satisfying a growing demand on computational power, some areas are still poorly explored. The MPI paradigm is considered as being the keystone for the large development of the HPC infrastructure over the last decade. However, even today the users have to face the lack of tools able to help increase the stability of the software stack and/or of the applications. In this paper we present and evaluate a tool designed to allow developers to further investigate the execution of parallel applications by enabling them to dynamically move back and forth in the execution timeline of a parallel application. Based on an unobtrusive message logging mechanism, deterministic replay is enforced, leading to a simpler and more efficient way to debug parallel software.

Original languageEnglish
Title of host publicationRecent Advances in Parallel Virtual Machine and Message Passing Interface - 14th European PVM/MPI Users' Group Meeting, Proceedings
PublisherSpringer Verlag
Pages297-306
Number of pages10
ISBN (Print)9783540754152
DOIs
StatePublished - 2007
Externally publishedYes
Event14th European PVM/MPI Users' Group Meeting on Parallel Virtual Machine and Message Passing Interface - Paris, France
Duration: Sep 30 2007Oct 3 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4757 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th European PVM/MPI Users' Group Meeting on Parallel Virtual Machine and Message Passing Interface
Country/TerritoryFrance
CityParis
Period09/30/0710/3/07

Fingerprint

Dive into the research topics of 'Retrospect: Deterministic replay of MPI applications for interactive distributed debugging'. Together they form a unique fingerprint.

Cite this