On undecidability aspects of resilient computations and implications to exascale

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Future Exascale computing systems with a large number of processors, memory elements and interconnection links, are expected to experience multiple, complex faults, which affect both applications and operating-runtime systems. A variety of algorithms, frameworks and tools are being proposed to realize and/or verify the resilience properties of computations that guarantee correct results on failure-prone computing systems. We analytically show that certain resilient computation problems in presence of general classes of faults are undecidable, that is, no algorithms exist for solving them. We first show that the membership verification in a generic set of resilient computations is undecidable. We describe classes of faults that can create infinite loops or non-halting computations, whose detection in general is undecidable. We then show certain resilient computation problems to be undecidable by using reductions from the loop detection and halting problems under two formulations, namely, an abstract programming language and Turing machines, respectively. These two reductions highlight different failure effects: the former represents program and data corruption, and the latter illustrates incorrect program execution. These results call for broad-based, well-characterized resilience approaches that complement purely computational solutions using methods such as hardware monitors, co-designs, and system- and application-specific diagnosis codes.

Original languageEnglish
Title of host publicationEuro-Par 2014
Subtitle of host publicationParallel Processing Workshops - Euro-Par 2014 International Workshops, Revised Selected Papers
EditorsLuís Lopes
PublisherSpringer Verlag
Pages511-522
Number of pages12
EditionPart 1
ISBN (Electronic)9783319143248
DOIs
StatePublished - 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPart 1
Volume8805
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • Exascale systems
  • Resilient computations
  • Uncomputability
  • Undecidability

Fingerprint

Dive into the research topics of 'On undecidability aspects of resilient computations and implications to exascale'. Together they form a unique fingerprint.

Cite this