Fault detection in multi-core processors using chaotic maps

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations

Abstract

Exascale systems built using multi-core processors are expected to experience several component faults during code executions lasting for hours. It is important to detect faults in processor cores so that faulty cores can be removed from scheduler pools, nodes with high failures can be swapped out, applications can be migrated, and check-point recoveries can be initiated. We propose light-weight codes that utilize chaotic computations and customized threads to detect component faults in multi-core processors. They concurrently execute dedicated threads that implement Poincare and identity maps, which are customized to isolate faults in arithmetic operations, memory elements and interconnects. The instruction execution errors and local memory errors are detected by threads dedicated to processor cores, and errors in inter-processor crossconnects are detected by global-local memory movements. We present preliminary implementation results on 4- and 48-core HP workstations under simulated faults.

Original languageEnglish
Pages27-32
Number of pages6
DOIs
StatePublished - 2013
Event3rd ACM Workshop on Fault-Tolerance for HPC at eXtreme Scale, FTXS 2013 - New York, NY, United States
Duration: Jun 18 2013Jun 18 2013

Conference

Conference3rd ACM Workshop on Fault-Tolerance for HPC at eXtreme Scale, FTXS 2013
Country/TerritoryUnited States
CityNew York, NY
Period06/18/1306/18/13

Keywords

  • chaotic maps
  • exascale systems
  • fault detection
  • multi-core processors
  • resilience

Fingerprint

Dive into the research topics of 'Fault detection in multi-core processors using chaotic maps'. Together they form a unique fingerprint.

Cite this