Abstract
The path to exascale computational capabilities in highperformance computing (HPC) systems is challenged by the inadequacy of present software technologies to adapt to the rapid evolution of architectures of supercomputing systems. The constraints of power have driven system designs to include increasingly heterogeneous architectures and diverse memory technologies and interfaces. Future systems are also expected to experience an increased rate of errors, such that the applications will no longer be able to assume correct behavior of the underlying machine. To enable the scientific community to succeed in scaling their applications, and to harness the capabilities of exascale systems, we need software strategies that enable explicit management of resilience to errors in the system, in addition to locality of reference in the complex memory hierarchies of future HPC systems. In prior work, we introduced the concept of explicitly reliable memory regions, called havens. Memory management using havens supports reliability management through a region-based approach to memory allocations. Havens enable the creation of robust memory regions, whose resilient behavior is guaranteed by software-based protection schemes. In this paper, we propose language support for havens through type annotations that make the structure of a program’s havens more explicit and convenient for HPC programmers to use. We describe how the extended haven-based memory management model is implemented, and demonstrate the use of the language-based annotations to affect the resiliency of a conjugate gradient solver application.
Original language | English |
---|---|
Title of host publication | Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers |
Editors | Chen Ding, John Criswell, Peng Wu |
Publisher | Springer Verlag |
Pages | 73-87 |
Number of pages | 15 |
ISBN (Print) | 9783319527086 |
DOIs | |
State | Published - 2017 |
Event | 29th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2016 - Rochester, United States Duration: Sep 28 2016 → Sep 30 2016 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10136 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 29th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2016 |
---|---|
Country/Territory | United States |
City | Rochester |
Period | 09/28/16 → 09/30/16 |
Bibliographical note
Publisher Copyright:© Springer International Publishing AG 2017.