NVIDIA’s Cloud Native Supercomputing

Gilad Shainer, Richard Graham, Chris J. Newburn, Oscar Hernandez, Gil Bloch, Tom Gibbs, Jack C. Wells

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

NVIDIA is defining a High-Performance Computing system architecture called Cloud Native Supercomputing to provide bare-metal system performance with security isolation and functional offload capabilities. Cloud Native Supercomputing delivers a cloud-based user experience in a way that maintains the performance and scalability that is uniquely delivered with supercomputing facilities. This new set of capabilities is being driven by the need to accommodate new scientific workflows that combine traditional simulation with experimental data from the edge and combine it with AI, data analytics and visualization frameworks in an integrated and even real-time fashion. These new workflows stress the system management, security and non-computational functions of traditional cloud or supercomputing facilities. Specifically, workflows that include data from untrusted (or non-local) sources, user experiences that range from Jupyter notebooks and interactive jobs to Gordon Bell-class capacity batch runs and I/O patterns that are unique to the emerging mix of in silico and live data sources. To achieve these objectives, we introduce a new architectural component called the Data Processing Unit (DPU), which in early embodiments is a system-on-a-chip (SoC) that includes an InfiniBand (IB) and Ethernet network adapter, programmable Arm cores, memory, PCI switches, and custom accelerators. The BlueField-1 and BlueField-2 devices are NVIDIA’s first DPU instances. This paper describes the architecture of cloud native supercomputing systems that use DPUs for isolation and acceleration, along with system services provided by that DPU. These services provide enhanced security through isolation, file-system management capabilities, monitoring, and the offloaded support for communication libraries.

Original languageEnglish
Title of host publicationDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Revised Selected Papers
Editors[given-name]Jeffrey Nichols, [given-name]Arthur ‘Barney’ Maccabe, James Nutaro, Swaroop Pophale, Pravallika Devineni, Theresa Ahearn, Becky Verastegui
PublisherSpringer Science and Business Media Deutschland GmbH
Pages340-357
Number of pages18
ISBN (Print)9783030964979
DOIs
StatePublished - 2022
Externally publishedYes
Event21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021 - Virtual, Online
Duration: Oct 18 2021Oct 20 2021

Publication series

NameCommunications in Computer and Information Science
Volume1512 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021
CityVirtual, Online
Period10/18/2110/20/21

Keywords

  • Artificial intelligence
  • Cloud computing
  • Data processing unit
  • High performance computing
  • Storage

Fingerprint

Dive into the research topics of 'NVIDIA’s Cloud Native Supercomputing'. Together they form a unique fingerprint.

Cite this