Abstract
The challenge of being able to scale application codes based on the Asynchronous Many-Task (AMT) Uintah framework on the Department of Energy (DOE) Aurora exascale system is addressed in this work by considering a challenging Reverse Monte Carlo Ray Tracing radiation benchmark calculation. This benchmark involves potentially global all-to-all communication and uses adaptive mesh refinement and ray tracing to achieve scalability. This benchmark has been used as part of previous scalability studies on a number of pre-exascale systems and on the DOE Frontier exascale system. This paper describes steps taken to enable this benchmark to run successfully on up to 10,240 nodes and 122,880 Intel® Ponte Vecchio Xe stacks on the DOE Aurora exascale system. This scalability was achieved through a limited number of experiments on Aurora, given machine loads and its uniqueness. These experiments constitute valuable lessons learned to achieve scalability at this level. The resulting scalability runs, while few in number, demonstrate relatively good strong-scaling characteristics. A detailed analysis of these results provides important indications about the path to scalability on Aurora for future work. Overall, these results continue the remarkable ability of this AMT approach to produce scalable solutions for challenging problems at extreme scale on heterogeneous architectures.
| Original language | English |
|---|---|
| Title of host publication | PEARC 2025 - Practice and Experience in Advanced Research Computing 2025 |
| Subtitle of host publication | The Power of Collaboration |
| Publisher | Association for Computing Machinery, Inc |
| ISBN (Electronic) | 9798400713989 |
| DOIs | |
| State | Published - Jul 18 2025 |
| Event | 2025 Practice and Experience in Advanced Research Computing, PEARC 2025 - Columbus, United States Duration: Jul 20 2025 → Jul 24 2025 |
Publication series
| Name | PEARC 2025 - Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration |
|---|
Conference
| Conference | 2025 Practice and Experience in Advanced Research Computing, PEARC 2025 |
|---|---|
| Country/Territory | United States |
| City | Columbus |
| Period | 07/20/25 → 07/24/25 |
Funding
This material is based upon work originally supported by the Department of Energy, National Nuclear Security Administration, under Award Number(s) DE-NA0002375. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This work is associated with an ALCF Aurora Early Science Program project. This work was supported by the Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. General support for the software development and for Martin Berzins came from the University of Texas at Austin under Award Number(s) UTA19-001215 and a gift from the Intel oneAPI Center of Excellence at the University of Utah. We would like to thank the ALCF for early system access with special thanks to Michael D’Mello (Intel Corporation) and to Allen Sanderson (University of Utah).
Keywords
- Aurora
- Exascale
- Intel Ponte Vecchio
- Kokkos
- Mesh Refinement
- Radiation Modeling
- Reverse Monte Carlo Ray Tracing
- Scalability
- Uintah