Abstract
Science applications preparing for the exascale era are increasingly exploring in situ computations comprising of simulation-analysis-reduction pipelines coupled in-memory. Efficient composition and execution of such complex pipelines for a target platform is a codesign process that evaluates the impact and tradeoffs of various application- and system-specific parameters. In this article, we describe a toolset for automating performance studies of composed HPC applications that perform online data reduction and analysis. We describe Cheetah, a new framework for composing parametric studies on coupled applications, and Savanna, a runtime engine for orchestrating and executing campaigns of codesign experiments. This toolset facilitates understanding the impact of various factors such as process placement, synchronicity of algorithms, and storage versus compute requirements for online analysis of large data. Ultimately, we aim to create a catalog of performance results that can help scientists understand tradeoffs when designing next-generation simulations that make use of online processing techniques. We illustrate the design of Cheetah and Savanna, and present application examples that use this framework to conduct codesign studies on small clusters as well as leadership class supercomputers.
Original language | English |
---|---|
Article number | e6519 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 34 |
Issue number | 14 |
DOIs | |
State | Published - Jun 25 2022 |
Funding
Notice: This article has been authored by UT‐Battelle, LLC, under contract DE‐AC05‐00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid‐up, irrevocable, worldwide license to publish or reproduce the published form of this article, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe‐public‐access‐plan). This research was supported in part by the Exascale Computing Project (17‐SC‐20‐SC) of the US Department of Energy (DOE), and by DOE's Advanced Scientific Research Office (ASCR) under contract DE‐AC02‐06CH11357. In addition, this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory and of the National Energy Research Scientific Computing Center, which are supported by the Office of Science of the US Department of Energy under Contract Numbers DE‐AC05‐00OR22725 and DE‐AC02‐05CH11231, respectively. information Exascale Computing Project, 17-SC-20-SC; Office of Science, DE-AC02-05CH11231; DE-AC02-06CH11357; DE-AC05-00OR22725This research was supported in part by the Exascale Computing Project (17-SC-20-SC) of the US Department of Energy (DOE), and by DOE's Advanced Scientific Research Office (ASCR) under contract DE-AC02-06CH11357. In addition, this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory and of the National Energy Research Scientific Computing Center, which are supported by the Office of Science of the US Department of Energy under Contract Numbers DE-AC05-00OR22725 and DE-AC02-05CH11231, respectively.
Keywords
- CODAR
- Cheetah
- Savanna
- codesign
- exascale
- in situ
- online
- reduction
- workflows