Revisiting the double checkpointing algorithm

Jack Dongarra, Thomas Herault, Yves Robert

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Fast check pointing algorithms require distributed access to stable storage. This paper revisits the approach base upon double check pointing, and compares the blocking algorithm of Zheng, Shi and Kalé, with the non-blocking algorithm of Ni, Meneses and Kalé, in terms of both performance and risk. We also extend their model proposed to assess the impact of the overhead associated to non-blocking communications. We then provide a new peer-to-peer check pointing algorithm, called the triple check pointing algorithm, that can work at constant memory, and achieves both higher efficiency and better risk handling than the double check pointing algorithm. We provide performance and risk models for all the evaluated protocols, and compare them through comprehensive simulations.

Original languageEnglish
Title of host publicationProceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
PublisherIEEE Computer Society
Pages706-715
Number of pages10
ISBN (Print)9780769549798
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 IEEE 37th Annual Computer Software and Applications Conference, COMPSAC 2013 - Boston, MA, Japan
Duration: Jul 22 2013Jul 26 2013

Publication series

NameProceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013

Conference

Conference2013 IEEE 37th Annual Computer Software and Applications Conference, COMPSAC 2013
Country/TerritoryJapan
CityBoston, MA
Period07/22/1307/26/13

Keywords

  • checkpoint
  • in-memory checkpoint
  • performance model
  • scheduling

Fingerprint

Dive into the research topics of 'Revisiting the double checkpointing algorithm'. Together they form a unique fingerprint.

Cite this