Abstract
We provide an overview of the software engineering efforts and their impact in QMCPACK, a production-level ab-initio Quantum Monte Carlo open-source code targeting high-performance computing (HPC) systems. Aspects included are: (i) strategic expansion of continuous integration (CI) targeting CPUs, using GitHub Actions own runners, and NVIDIA and AMD GPUs used in pre-exascale systems, (ii) incremental reduction of memory leaks using sanitizers, (iii) incorporation of Docker containers for CI and reproducibility, and (iv) refactoring efforts to improve maintainability, testing coverage, and memory lifetime management. We quantify the value of these improvements by providing metrics to illustrate the shift towards a predictive, rather than reactive, maintenance approach. Our goal, in documenting the impact of these efforts on QMCPACK, is to contribute to the body of knowledge on the importance of research software engineering (RSE) for the stewardship and advancement of community HPC codes to enable scientific discovery at scale.
| Original language | English |
|---|---|
| Article number | 107502 |
| Journal | Future Generation Computer Systems |
| Volume | 163 |
| DOIs | |
| State | Published - Feb 2025 |
Funding
This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. After the end of this project, at the end of 2023, support was provided by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program and Center for Predictive Simulation of Functional Materials.
Keywords
- CI
- High-performance computing
- Memory safety
- QMCPACK
- Software engineering