A Path to Operating System and Runtime Support for Extreme Scale Tools

Project: Research

Project Details

Description

While the hardware and, to some extent, the operating system for extremely large-scale computers has become a reality, several key facilities have lagged behind. This project partially addresses the problem with an approach to allow efficient and scalable access to process control functionality for many thousands (or even hundreds of thousands) of processes in support of debugging, scheduling, parallel runtime systems, program steering, and system monitoring. The project uses a novel concept called the group file that provides efficient operation simultaneously on many files, and a scalable group file system infrastructure based on the Multicast/Reduction Network (MRNet) Tree-Based Overlay Network software, as a foundation for scalable tools including debuggers, system monitoring software, and performance analysis software. The project is a collaboration between researchers at the University of Wisconsin, Madison; Oak Ridge National Laboratory (ORNL); and TotalView Technologies.

StatusFinished
Effective start/end date07/1/0806/30/11

Funding

  • U.S. Department of Energy

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.