Supercomputing's monster in the closet

Research output: Contribution to journalArticlepeer-review

36 Scopus citations

Abstract

As a child, were you ever afraid that a monster lurking in your bedroom would leap out of the dark and get you? My job at Oak Ridge National Laboratory is to worry about a similar monster, hiding in the steel cabinets of the supercomputers and threatening to crash the largest computing machines on the planet. The monster is something supercomputer specialists call resilience- or rather the lack of resilience. It has bitten several supercomputers in the past. A high-profile example affected what was the second fastest supercomputer in the world in 2002, a machine called ASCI Q at Los Alamos National Laboratory. When it was first installed at the New Mexico lab, this computer couldn't run more than an hour or so without crashing.

Original languageEnglish
Article number7420396
Pages (from-to)30-35
Number of pages6
JournalIEEE Spectrum
Volume53
Issue number3
DOIs
StatePublished - Mar 2016

Fingerprint

Dive into the research topics of 'Supercomputing's monster in the closet'. Together they form a unique fingerprint.

Cite this