Transitioning from File-Based HPC Workflows to Streaming Data Pipelines with openPMD and ADIOS2

Franz Poeschel, Juncheng E, William F. Godoy, Norbert Podhorszki, Scott Klasky, Greg Eisenhauer, Philip E. Davis, Lipeng Wan, Ana Gainaru, Junmin Gu, Fabian Koller, René Widera, Michael Bussmann, Axel Huebl

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks.

Original languageEnglish
Title of host publicationDriving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Revised Selected Papers
Editors[given-name]Jeffrey Nichols, [given-name]Arthur ‘Barney’ Maccabe, James Nutaro, Swaroop Pophale, Pravallika Devineni, Theresa Ahearn, Becky Verastegui
PublisherSpringer Science and Business Media Deutschland GmbH
Pages99-118
Number of pages20
ISBN (Print)9783030964979
DOIs
StatePublished - 2022
Event21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021 - Virtual, Online
Duration: Oct 18 2021Oct 20 2021

Publication series

NameCommunications in Computer and Information Science
Volume1512 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021
CityVirtual, Online
Period10/18/2110/20/21

Funding

Acknowledgements. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration). Supported by EC through Laserlab-Europe, H2020 EC-GA 871124. Supported by the Consortium for Advanced Modeling of Particles Accelerators (CAMPA), funded by the U.S. DOE Office of Science under Contract No. DE-AC02-05CH11231. This work was partially funded by the Center of Advanced Systems Understanding (CASUS), which is financed by Germany’s Federal Ministry of Education and Research (BMBF) and by the Saxon Ministry for Science, Culture and Tourism (SMWK) with tax funds on the basis of the budget approved by the Saxon State Parliament.

FundersFunder number
Center of Advanced Systems Understanding
Consortium for Advanced Modeling of Particles Accelerators
Saxon Ministry for Science, Culture and Tourism
U.S. Department of Energy organizations
U.S. Department of Energy17-SC-20-SC, DE-AC05-00OR22725
Office of ScienceDE-AC02-05CH11231
National Nuclear Security AdministrationH2020 EC-GA 871124
Horizon 2020 Framework Programme871124
Bundesministerium für Bildung und Forschung
Sächsisches Staatsministerium für Wissenschaft und Kunst

    Keywords

    • Big Data
    • High performance computing
    • RDMA
    • Streaming

    Fingerprint

    Dive into the research topics of 'Transitioning from File-Based HPC Workflows to Streaming Data Pipelines with openPMD and ADIOS2'. Together they form a unique fingerprint.

    Cite this