Category I: Anvil - A National Composable Advanced Computational Resource for the Future of Science and Engineering

  • Song, Carol (PI)
  • Smith, Preston M (CoPI)
  • Pazouki, Arman (CoPI)
  • Zhu, Xiao (CoPI)
  • Kalyanam, Rajesh (CoPI)
  • Gough, Erik (CoPI)

Project: Research

Project Details

Description

As computing permeates nearly all fields of science and engineering, there is an exponential growth of computing needs from both the traditional computing-intensive domains and the emerging new and more diverse fields of research. The rise of machine learning and artificial intelligence applications has accelerated and broadened the use of computational resources from research in creating new and more environmentally friendly materials to improving medicine in our fight against deadly diseases. There are three main challenges to meeting this rapidly evolving landscape of national computational needs: a shortage of capacity, increasingly diverse applications, and computational literacy and training. This project aims to meet these challenges and transform the way computing is delivered by developing and deploying a composable advanced computing resource, Anvil, to the national research community to significantly increase both the computing capacity and accessibility. Anvil integrates a large-capacity high-performance computing (HPC) cluster with a comprehensive ecosystem of software, access interfaces, programming environments, and composable services to form a seamless environment able to support a broad range of current and future science and engineering applications. Through a carefully designed student training program and partnerships with regional and other universities, XSEDE, and Women in HPC programs, this project will develop computing competency in the next-generation workforce, and engage and train a broader audience including underrepresented students at minority-serving and EPSCoR (Established Program to Stimulate Competitive Research) institutions. Built with a forward-looking architecture with a high core count, and improved memory bandwidth and I/O, Anvil can effectively support traditional HPC with fast turnaround for high throughput, mid-scale computation jobs. Anvil consists of 1000 128-core computing nodes based on the next-generation AMD Epyc “Milan" architecture that can deliver a total peak performance of 5.3 Petaflops. Each node has 256 GB of memory, and a 100 gigabits/second bandwidth from the Mellanox HDR InfiniBand interconnect, allowing multiple jobs of up to 1024 cores to be run at full speed over the interconnect fabric. These nodes are complemented by 32 large-memory nodes with 1 TB of RAM each, and 16 Nvidia GPU nodes with 4 “Volta Next” GPUs per node. The GPU nodes are capable of 1.57 petaflops of single-precision performance to support machine learning and a wide range of current and future science and engineering applications. Anvil’s multiple tiers of storage systems include a long-term archive, persistent file and campaign storage, a 10 PB scratch file system, a 3 PB flash burst buffer, and object storage to support a variety of workflows and storage needs. Anvil will lower the barrier to entry to advanced computing CI by providing interactive computing and desktop environments that ease the transition for users from diverse domains new to HPC. By providing feature-rich interactive environments such as Open OnDemand and ThinLinc, users can rapidly become productive on Anvil through Linux and Windows desktops, or familiar tools through their browser (e.g., Jupyter, RStudio). Complex scientific software environments and application stacks will be supported via containers orchestrated within a powerful composable subsystem. Anvil supports cloud-bursting of computational workloads as well as use of public cloud machine learning platforms including GPU and FPGA accelerators and software tools to automate hyperparameter tuning and algorithm selection for exploratory ML research. An existing production-quality science gateway at Purdue will support XSEDE researchers to share their data and tools online and facilitate easy access to Anvil and other XSEDE resources in classroom instruction and training activities. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date10/1/2009/30/27

Funding

  • National Science Foundation

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.