Abstract
A key challenge faced when preparing codes for Department of Energy (DOE) exascale systems was designing scalable applications for systems featuring hardware and software not yet available at leadership-class scale. With such systems now available, it is important to evaluate scalability of the resulting software solutions on these target systems. One such code designed with the exascale DOE Aurora and DOE Frontier systems in mind is the Uintah Computational Framework, an open-source asynchronous many-task (AMT) runtime system. To prepare for exascale, Uintah adopted a portable MPI+X hybrid parallelism approach using the Kokkos performance portability library (i.e., MPI+Kokkos). This paper complements recent work with additional details and an evaluation of the resulting approach on Aurora and Frontier. Results are shown for a challenging benchmark demonstrating interoperability of 3 portable codes essential to Uintah-related combustion research. These results demonstrate single-source portability across Aurora and Frontier with scaling characteristics shown to 3,072 Aurora nodes and 9,216 Frontier nodes. In addition to showing results run to new scales on new systems, this paper also discusses lessons learned through efforts preparing Uintah for exascale systems.
| Original language | English |
|---|---|
| Title of host publication | Euro-Par 2024 |
| Subtitle of host publication | Parallel Processing Workshops - Euro-Par 2024 International Workshops, Proceedings |
| Editors | Silvina Caino-Lores, Demetris Zeinalipour, Thaleia Dimitra Doudali, David E. Singh, Gracia Ester Martín Garzón, Leonel Sousa, Diego Andrade, Tommaso Cucinotta, Donato D'Ambrosio, Patrick Diehl, Manuel F. Dolz, Admela Jukan, Raffaele Montella, Matteo Nardelli, Marta Garcia-Gasulla, Sarah Neuwirth |
| Publisher | Springer Science and Business Media Deutschland GmbH |
| Pages | 231-242 |
| Number of pages | 12 |
| ISBN (Print) | 9783031901997 |
| DOIs | |
| State | Published - 2025 |
| Event | 30th International Conference on Parallel and Distributed Computing, Euro-Par 2024 - Madrid, Spain Duration: Aug 26 2024 → Aug 30 2024 |
Publication series
| Name | Lecture Notes in Computer Science |
|---|---|
| Volume | 15385 LNCS |
| ISSN (Print) | 0302-9743 |
| ISSN (Electronic) | 1611-3349 |
Conference
| Conference | 30th International Conference on Parallel and Distributed Computing, Euro-Par 2024 |
|---|---|
| Country/Territory | Spain |
| City | Madrid |
| Period | 08/26/24 → 08/30/24 |
Funding
This material is based upon work originally supported by the Department of Energy, National Nuclear Security Administration, under Award Number(s) DE-NA0002375. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This work is associated with an ALCF Aurora Early Science Program project. This work was supported by the Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Support for Allen Sanderson comes from the University of Texas at Austin under Award Number(s) UTA19-001215 and a gift from the Intel One API Centers Program. We would like to thank the ALCF and OLCF for early system access with special thanks to Varsha Madananth.
Keywords
- Asynchronous Many-Task Runtime System
- Exascale
- Parallelism and Concurrency
- Performance Portability
- Portability
- Software Engineering