Abstract
Coupled AI-Simulation workflows are becoming the major workloads for HPC facilities, and their increasing complexity necessitates new tools for performance analysis and prototyping of new in-situ workflows. We present SimAI-Bench, a tool designed to both prototype and evaluate these coupled workflows. In this paper, we use SimAI-Bench to benchmark the data transport performance of two common patterns on the Aurora supercomputer: a one-to-one workflow with co-located simulation and AI training instances, and a many-to-one workflow where a single AI model is trained from an ensemble of simulations. For the one-to-one pattern, our analysis shows that node-local and DragonHPC data staging strategies provide excellent performance compared Redis and Lustre file system. For the many-to-one pattern, we find that data transport becomes a dominant bottleneck as the ensemble size grows. Our evaluation reveals that file system is the optimal solution among the tested strategies for the many-to-one pattern.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 985-996 |
| Number of pages | 12 |
| ISBN (Electronic) | 9798400718717 |
| DOIs | |
| State | Published - Nov 15 2025 |
| Event | 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops - St. Louis, United States Duration: Nov 16 2025 → Nov 21 2025 |
Publication series
| Name | Proceedings of 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops |
|---|
Conference
| Conference | 2025 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC 2025 Workshops |
|---|---|
| Country/Territory | United States |
| City | St. Louis |
| Period | 11/16/25 → 11/21/25 |
Funding
This research used resources of the Argonne Leadership Computing Facility, a U.S. Department of Energy (DOE) Office of Science user facility at Argonne National Laboratory and is based on research supported by the U.S. DOE Office of Science-Advanced Scientific Computing Research Program, under Contract No. DE-AC02-06CH11357.
Keywords
- Benchmarking
- Mini-app
- Workflows