On undecidability aspects of resilient computations and implications to exascale

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Future Exascale computing systems with a large number of processors, memory elements and interconnection links, are expected to experience multiple, complex faults, which affect both applications and operating-runtime systems. A variety of algorithms, frameworks and tools are being proposed to realize and/or verify the resilience properties of computations that guarantee correct results on failure-prone computing systems. We analytically show that certain resilient computation problems in presence of general classes of faults are undecidable, that is, no algorithms exist for solving them. We first show that the membership verification in a generic set of resilient computations is undecidable. We describe classes of faults that can create infinite loops or non-halting computations, whose detection in general is undecidable. We then show certain resilient computation problems to be undecidable by using reductions from the loop detection and halting problems under two formulations, namely, an abstract programming language and Turing machines, respectively. These two reductions highlight different failure effects: the former represents program and data corruption, and the latter illustrates incorrect program execution. These results call for broad-based, well-characterized resilience approaches that complement purely computational solutions using methods such as hardware monitors, co-designs, and system- and application-specific diagnosis codes.

    Original languageEnglish
    Title of host publicationEuro-Par 2014
    Subtitle of host publicationParallel Processing Workshops - Euro-Par 2014 International Workshops, Revised Selected Papers
    EditorsLuís Lopes
    PublisherSpringer Verlag
    Pages511-522
    Number of pages12
    EditionPart 1
    ISBN (Electronic)9783319143248
    DOIs
    StatePublished - 2014

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPart 1
    Volume8805
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Keywords

    • Exascale systems
    • Resilient computations
    • Uncomputability
    • Undecidability

    Fingerprint

    Dive into the research topics of 'On undecidability aspects of resilient computations and implications to exascale'. Together they form a unique fingerprint.

    Cite this