Abstract
The National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory has a long history of deploying ground-breaking leadership-class supercomputers for the U.S. Department of Energy. The latest in this line of supercomputers is Frontier, the first supercomputer to break the exascale barrier (1018 floating-point operations per second) on the TOP500 list. Frontier serves a wide array of scientific domains, from traditional simulation-based workloads to newer AI and Machine Learning workloads. To best serve the NCCS user community, NCCS uses Spack to deploy a comprehensive software stack of scientific software packages, providing straightforward access to these packages through Lmod Environment Modules. Maintaining a large software stack while also including multiple new compiler releases each year is a very time-consuming task. Additionally, it is not straightforward to provide a software stack alongside existing vendor-provided software such as the HPE/Cray Programming Environment (CPE), and existing CPE, Spack, and Lmod integration does not allow for multiple versions of GPU libraries such as AMD's ROCm to be used. To address these challenges and shortcomings, NCCS has developed the NCCS Software Provisioning tool (NSP)1, a tool for deploying and monitoring software stacks on HPC systems. NSP allows NCCS to quickly and effectively provision software stacks from the ground up using template-driven recipes and configuration files. NSP is successfully deployed on Frontier and several other NCCS clusters, enabling the NCCS software team to quickly deploy software stacks for newly-released compilers, expand current software offerings, better support GPU-based software, and monitor Lmod module usage to identify unused software packages that can be removed from the software stack. In this work, we discuss the shortcomings of the previous CPE, Spack, and Lmod usage at NCCS, provide further details on the implementation and structure of NSP, then discuss the benefits that NSP provides.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of CUG 2025 - Cray User Group Conference |
| Editors | Ashley Barker, Bilel Hadri, Colleen Bertoni, Nick Hagerty, Timothy W. Robinson |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 127-134 |
| Number of pages | 8 |
| ISBN (Electronic) | 9798400713279 |
| DOIs | |
| State | Published - Nov 11 2025 |
| Event | Cray User Group, CUG 2025 - Jersey City, United States Duration: May 4 2025 → May 8 2025 |
Publication series
| Name | Proceedings of CUG 2025 - Cray User Group Conference |
|---|
Conference
| Conference | Cray User Group, CUG 2025 |
|---|---|
| Country/Territory | United States |
| City | Jersey City |
| Period | 05/4/25 → 05/8/25 |
Funding
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.
Keywords
- Configuration as code
- High-Performance Computing
- Software deployments
- Spack