Lessons Learned and Scalability Achieved When Porting Uintah to DOE Exascale Systems

  • John K. Holmen
  • , Marta García
  • , Allen Sanderson
  • , Abhishek Bagusetty
  • , Martin Berzins

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

A key challenge faced when preparing codes for Department of Energy (DOE) exascale systems was designing scalable applications for systems featuring hardware and software not yet available at leadership-class scale. With such systems now available, it is important to evaluate scalability of the resulting software solutions on these target systems. One such code designed with the exascale DOE Aurora and DOE Frontier systems in mind is the Uintah Computational Framework, an open-source asynchronous many-task (AMT) runtime system. To prepare for exascale, Uintah adopted a portable MPI+X hybrid parallelism approach using the Kokkos performance portability library (i.e., MPI+Kokkos). This paper complements recent work with additional details and an evaluation of the resulting approach on Aurora and Frontier. Results are shown for a challenging benchmark demonstrating interoperability of 3 portable codes essential to Uintah-related combustion research. These results demonstrate single-source portability across Aurora and Frontier with scaling characteristics shown to 3,072 Aurora nodes and 9,216 Frontier nodes. In addition to showing results run to new scales on new systems, this paper also discusses lessons learned through efforts preparing Uintah for exascale systems.

Original languageEnglish
Title of host publicationEuro-Par 2024
Subtitle of host publicationParallel Processing Workshops - Euro-Par 2024 International Workshops, Proceedings
EditorsSilvina Caino-Lores, Demetris Zeinalipour, Thaleia Dimitra Doudali, David E. Singh, Gracia Ester Martín Garzón, Leonel Sousa, Diego Andrade, Tommaso Cucinotta, Donato D'Ambrosio, Patrick Diehl, Manuel F. Dolz, Admela Jukan, Raffaele Montella, Matteo Nardelli, Marta Garcia-Gasulla, Sarah Neuwirth
PublisherSpringer Science and Business Media Deutschland GmbH
Pages231-242
Number of pages12
ISBN (Print)9783031901997
DOIs
StatePublished - 2025
Event30th International Conference on Parallel and Distributed Computing, Euro-Par 2024 - Madrid, Spain
Duration: Aug 26 2024Aug 30 2024

Publication series

NameLecture Notes in Computer Science
Volume15385 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference30th International Conference on Parallel and Distributed Computing, Euro-Par 2024
Country/TerritorySpain
CityMadrid
Period08/26/2408/30/24

Funding

This material is based upon work originally supported by the Department of Energy, National Nuclear Security Administration, under Award Number(s) DE-NA0002375. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. This work is associated with an ALCF Aurora Early Science Program project. This work was supported by the Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Support for Allen Sanderson comes from the University of Texas at Austin under Award Number(s) UTA19-001215 and a gift from the Intel One API Centers Program. We would like to thank the ALCF and OLCF for early system access with special thanks to Varsha Madananth.

Keywords

  • Asynchronous Many-Task Runtime System
  • Exascale
  • Parallelism and Concurrency
  • Performance Portability
  • Portability
  • Software Engineering

Fingerprint

Dive into the research topics of 'Lessons Learned and Scalability Achieved When Porting Uintah to DOE Exascale Systems'. Together they form a unique fingerprint.

Cite this