Abstract
In this paper, we present a technique, based on checksum and reverse computation, that enables high-performance matrix operations to be fault-tolerant with low overhead. We have implemented this technique on five matrix operations: matrix multiplication, Cholesky factorization, LV factorization, QR factorization and Hessenberg reduction. The overhead of checkpointing and recovery is analyzed both theoretically and experimentally. These analyses confirm that our technique can provide fault tolerance for these high-performance matrix operations with low overhead.
| Original language | English |
|---|---|
| Pages | 70-77 |
| Number of pages | 8 |
| State | Published - 1996 |
| Externally published | Yes |
| Event | Proceedings of the 1996 6th Symposium on the Frontiers of Massively Parallel Computing, Frontiers'96 - Annapolis, MD, USA Duration: Oct 27 1996 → Oct 31 1996 |
Conference
| Conference | Proceedings of the 1996 6th Symposium on the Frontiers of Massively Parallel Computing, Frontiers'96 |
|---|---|
| City | Annapolis, MD, USA |
| Period | 10/27/96 → 10/31/96 |
Fingerprint
Dive into the research topics of 'Fault tolerant matrix operations using checksum and reverse computation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver