Scaling Uintah on the Aurora Exascale System up to 122,880 Intel Ponte Vecchio Xe Stacks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The challenge of being able to scale application codes based on the Asynchronous Many-Task (AMT) Uintah framework on the Department of Energy (DOE) Aurora exascale system is addressed in this work by considering a challenging Reverse Monte Carlo Ray Tracing radiation benchmark calculation. This benchmark involves potentially global all-to-all communication and uses adaptive mesh refinement and ray tracing to achieve scalability. This benchmark has been used as part of previous scalability studies on a number of pre-exascale systems and on the DOE Frontier exascale system. This paper describes steps taken to enable this benchmark to run successfully on up to 10,240 nodes and 122,880 Intel® Ponte Vecchio Xe stacks on the DOE Aurora exascale system. This scalability was achieved through a limited number of experiments on Aurora, given machine loads and its uniqueness. These experiments constitute valuable lessons learned to achieve scalability at this level. The resulting scalability runs, while few in number, demonstrate relatively good strong-scaling characteristics. A detailed analysis of these results provides important indications about the path to scalability on Aurora for future work. Overall, these results continue the remarkable ability of this AMT approach to produce scalable solutions for challenging problems at extreme scale on heterogeneous architectures.

Original languageEnglish
Title of host publicationPEARC 2025 - Practice and Experience in Advanced Research Computing 2025
Subtitle of host publicationThe Power of Collaboration
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400713989
DOIs
StatePublished - Jul 18 2025
Event2025 Practice and Experience in Advanced Research Computing, PEARC 2025 - Columbus, United States
Duration: Jul 20 2025Jul 24 2025

Publication series

NamePEARC 2025 - Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration

Conference

Conference2025 Practice and Experience in Advanced Research Computing, PEARC 2025
Country/TerritoryUnited States
CityColumbus
Period07/20/2507/24/25

Funding

This material is based upon work originally supported by the Department of Energy, National Nuclear Security Administration, under Award Number(s) DE-NA0002375. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This work is associated with an ALCF Aurora Early Science Program project. This work was supported by the Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. General support for the software development and for Martin Berzins came from the University of Texas at Austin under Award Number(s) UTA19-001215 and a gift from the Intel oneAPI Center of Excellence at the University of Utah. We would like to thank the ALCF for early system access with special thanks to Michael D’Mello (Intel Corporation) and to Allen Sanderson (University of Utah).

Keywords

  • Aurora
  • Exascale
  • Intel Ponte Vecchio
  • Kokkos
  • Mesh Refinement
  • Radiation Modeling
  • Reverse Monte Carlo Ray Tracing
  • Scalability
  • Uintah

Fingerprint

Dive into the research topics of 'Scaling Uintah on the Aurora Exascale System up to 122,880 Intel Ponte Vecchio Xe Stacks'. Together they form a unique fingerprint.

Cite this