Blackcomb 2: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems

  • Kuno, Harumi H. (PI)
  • Vetter, Jeffrey (CoPI)
  • Mudge, Trevor T. (CoPI)
  • Schreiber, Rob (CoPI)
  • Yuan, Xie X. (CoPI)

Project: Research

Project Details

Description

The impending shift from DRAM and Flash to fast nonvolatile memory (NVM) provides a game-changing opportunity to rethink traditional system architectures; to address their energy inefficiencies; to leverage the new devices for greater performance; and to exploit the new capability of nonvolatility to enhance system resilience, even in the face of larger system scale and degraded reliability of components. Building on the accomplishments of the Blackcomb Project, funded in 2010, we propose to identify, evaluate, and optimize the most promising NVM hardware and software technologies, which are essential to provide the necessary memory capacity, performance, resilience, and energy efficiency in exascale systems.

Capacity and energy are the key drivers. Today's pressing application requirements mandate exascale computing; the datasets and concurrency of these applications demand memory capacity that cannot be met by traditional memory technologies. Although charge-based DRAM will endure, experts foresee only modest DRAM capacity gains. And, as moving data between levels of the hierarchy, and storing and accessing data on disk consumes significant energy, a more energy efficient solution is required. We, therefore, posit that exascale systems will need high density, energy- efficient storage technologies, such as nonvolatile memory resistance-based, such as phase-change memory and ReRAM - for access, transformation, and management of exascale data.

To address these issues, in this renewal request, we propose continue the work of our vertically-Integrated, focused team, where our co-design is informed by proxies and interactions with DOE co-design centers. In software, we start by assuming that nonvolatility requires new abstractions and new infrastructure, allowing persistent objects, and delivering performance gains for _le systems and robustness through fast checkpointing. In architecture, we evaluate the tradeoffs of performance and energy by employing a scalable and accurate simulation system to analyze realistic proxy applications. Through this co-design process, we rethink the whole memory architecture, from the cell to the array to the device, with ECC, wearout management, and interfaces. Finally, we provide open source software models of these NVM-based memory systems.

Going forward, we propose to simplify the organization of our project into these task areas:

  • Nonvolatile Memory Technology and Architecture understands NVM technology (emphasizing ReRAM) and the design and tradeoffs of the memory and storage devices.
  • System Architecture architects the exascale memory system and develops a simulation method-ology to characterize its benefits, in capacity, performance, and energy, in a co-design effort.
  • System and Runtime Software builds productive abstractions and interfaces that allow users to best exploit NVM in these new configurations, and develop a robust, fault-tolerant programming model.
  • Applications identifies and characterizes key DOE applications, tracks how NVM may transform the applications, and interacts with stakeholders to build awareness of these opportunities.

Three years of collaboration have formed a team with common interests and backgrounds, a shared vision, a thorough understanding of the technologies, and a growing capacity to use the co-design process and the DOE's resources.

StatusFinished
Effective start/end date06/15/1406/14/18

Funding

  • Advanced Scientific Computing Research

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.