Abstract
Recently, an algorithm-based approach using diskless checkpointing has been developed to provide fault tolerance for high-performance matrix operations. With this approach, fault tolerance is incorporated into the matrix operations, making them resilient to any single process failure with low overhead. In this paper, we present a technique called multiple checkpointing that enables the matrix operations to tolerate a certain set of multiple processor failures by adding multiple checkpointing processors. Results of implementing this technique on a network of workstations show improvement in both the reliability of the computation and the performance of checkpointing.
| Original language | English |
|---|---|
| Pages | 460-465 |
| Number of pages | 6 |
| State | Published - 1997 |
| Externally published | Yes |
| Event | Proceedings of the 1997 2nd High Performance Computing on the Information Superhighway, HPC Asia'97 - Seoul, South Korea Duration: Apr 28 1997 → May 2 1997 |
Conference
| Conference | Proceedings of the 1997 2nd High Performance Computing on the Information Superhighway, HPC Asia'97 |
|---|---|
| City | Seoul, South Korea |
| Period | 04/28/97 → 05/2/97 |
Fingerprint
Dive into the research topics of 'Fault tolerant matrix operations for networks of workstations using multiple checkpointing'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver