TY - GEN
T1 - Chaotic-identity maps for robustness estimation of exascale computations
AU - Rao, Nageswara S.V.
PY - 2012
Y1 - 2012
N2 - Exascale computing systems are expected to consist of millions of components, and the current engineering and manufacturing practices cannot guarantee their complete fault-free operation during the code executions lasting several hours. Consequently, the outputs of computations executed on them must be quantified with confidence estimates that reflect their failure-free execution. We propose (i) light-weight computational modules that utilize chaotic computations and customized identity maps to detect component failures, and (ii) statistical estimation methods that generate robustness estimates for the system and computations based on the module outputs. The diagnosis modules execute multiple Poincare and identity maps, which are customized to detect certain classes of failures in the compute nodes and interconnects. We propose statistical methods that generate robustness estimates for the system using the outputs of pipelined chains of diagnosis modules. These diagnosis modules can be inserted into application codes to identify failures, and generate confidence estimates for the application outputs. We present proof-of-principle simulation examples to illustrate the proposed approach.
AB - Exascale computing systems are expected to consist of millions of components, and the current engineering and manufacturing practices cannot guarantee their complete fault-free operation during the code executions lasting several hours. Consequently, the outputs of computations executed on them must be quantified with confidence estimates that reflect their failure-free execution. We propose (i) light-weight computational modules that utilize chaotic computations and customized identity maps to detect component failures, and (ii) statistical estimation methods that generate robustness estimates for the system and computations based on the module outputs. The diagnosis modules execute multiple Poincare and identity maps, which are customized to detect certain classes of failures in the compute nodes and interconnects. We propose statistical methods that generate robustness estimates for the system using the outputs of pipelined chains of diagnosis modules. These diagnosis modules can be inserted into application codes to identify failures, and generate confidence estimates for the application outputs. We present proof-of-principle simulation examples to illustrate the proposed approach.
UR - http://www.scopus.com/inward/record.url?scp=84880913371&partnerID=8YFLogxK
U2 - 10.1109/DSNW.2012.6264667
DO - 10.1109/DSNW.2012.6264667
M3 - Conference contribution
AN - SCOPUS:84880913371
SN - 9781467322645
T3 - Proceedings of the International Conference on Dependable Systems and Networks
BT - 2012 IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops, DSN-W 2012
T2 - 2012 IEEE/IFIP 42nd International Conference on Dependable Systems and Networks Workshops, DSN-W 2012
Y2 - 25 June 2012 through 28 June 2012
ER -