Abstract
Scientific computing increasingly relies on surrogate models to accelerate high-fidelity simulations, enable real-time predictions, and facilitate exploration of the design space. However, building effective surrogates at scale presents several challenges: simulations are computationally expensive, data generation must be carefully managed, and surrogate learning requires handling large, heterogeneous, and dynamically evolving workflows. These challenges are amplified in active learning contexts, where surrogate models guide further data acquisition, resulting in a tight coupling between simulation, inference, and model training. This paper introduces the ROSE (RADICAL Orchestrator for Surrogate Exploration) framework, a flexible, portable, and scalable software system designed to support the end-to-end lifecycle of surrogate modeling in high-performance computing environments. ROSE integrates active learning algorithms with scalable orchestration, managing asynchronous execution across diverse computing resources while minimizing user burden. It supports both in-situ and ex-situ workflows, online and offline training, and accommodates the dynamic structure of adaptive sampling and surrogate refinement. ROSE is used for three scientific use cases: electrolyte structure extraction, neutron diffraction structure recovery, and colloid phase classification. Across Polaris, Perlmutter, and Delta, ROSE sustains high throughput with low orchestration overhead, and delivers 4-8× end-to-end speedups in our three use cases by exploiting parallel, pilot-based execution, where asynchronous orchestration typically yields 1.5-3× versus synchronous baselines.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 61-70 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798400718717 |
| DOIs | |
| State | Published - Nov 15 2025 |
| Event | 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops - St. Louis, United States Duration: Nov 16 2025 → Nov 21 2025 |
Publication series
| Name | Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops |
|---|
Conference
| Conference | 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops |
|---|---|
| Country/Territory | United States |
| City | St. Louis |
| Period | 11/16/25 → 11/21/25 |
Funding
Funding sources NSF 2212549 and DOE ASCR DE-SC0021352. We thank Andre Merzky and Mikhail Titov for their contributions to RADICAL-Cybertools. We thank Ozgur Kilic and Matteo Turilli for useful discussions.
Keywords
- Artificial Intelligence
- Coupled Physics-based and AI
- Machine Learning
- Runtime System
- Scalable Workflow