Deploying and Tracking Software with NCCS Software Provisioning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory has a long history of deploying ground-breaking leadership-class supercomputers for the U.S. Department of Energy. The latest in this line of supercomputers is Frontier, the first supercomputer to break the exascale barrier (1018 floating-point operations per second) on the TOP500 list. Frontier serves a wide array of scientific domains, from traditional simulation-based workloads to newer AI and Machine Learning workloads. To best serve the NCCS user community, NCCS uses Spack to deploy a comprehensive software stack of scientific software packages, providing straightforward access to these packages through Lmod Environment Modules. Maintaining a large software stack while also including multiple new compiler releases each year is a very time-consuming task. Additionally, it is not straightforward to provide a software stack alongside existing vendor-provided software such as the HPE/Cray Programming Environment (CPE), and existing CPE, Spack, and Lmod integration does not allow for multiple versions of GPU libraries such as AMD's ROCm to be used. To address these challenges and shortcomings, NCCS has developed the NCCS Software Provisioning tool (NSP)1, a tool for deploying and monitoring software stacks on HPC systems. NSP allows NCCS to quickly and effectively provision software stacks from the ground up using template-driven recipes and configuration files. NSP is successfully deployed on Frontier and several other NCCS clusters, enabling the NCCS software team to quickly deploy software stacks for newly-released compilers, expand current software offerings, better support GPU-based software, and monitor Lmod module usage to identify unused software packages that can be removed from the software stack. In this work, we discuss the shortcomings of the previous CPE, Spack, and Lmod usage at NCCS, provide further details on the implementation and structure of NSP, then discuss the benefits that NSP provides.

Original languageEnglish
Title of host publicationProceedings of CUG 2025 - Cray User Group Conference
EditorsAshley Barker, Bilel Hadri, Colleen Bertoni, Nick Hagerty, Timothy W. Robinson
PublisherAssociation for Computing Machinery, Inc
Pages127-134
Number of pages8
ISBN (Electronic)9798400713279
DOIs
StatePublished - Nov 11 2025
EventCray User Group, CUG 2025 - Jersey City, United States
Duration: May 4 2025May 8 2025

Publication series

NameProceedings of CUG 2025 - Cray User Group Conference

Conference

ConferenceCray User Group, CUG 2025
Country/TerritoryUnited States
CityJersey City
Period05/4/2505/8/25

Funding

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • Configuration as code
  • High-Performance Computing
  • Software deployments
  • Spack

Fingerprint

Dive into the research topics of 'Deploying and Tracking Software with NCCS Software Provisioning'. Together they form a unique fingerprint.

Cite this