TY - GEN
T1 - Transitioning from File-Based HPC Workflows to Streaming Data Pipelines with openPMD and ADIOS2
AU - Poeschel, Franz
AU - E, Juncheng
AU - Godoy, William F.
AU - Podhorszki, Norbert
AU - Klasky, Scott
AU - Eisenhauer, Greg
AU - Davis, Philip E.
AU - Wan, Lipeng
AU - Gainaru, Ana
AU - Gu, Junmin
AU - Koller, Fabian
AU - Widera, René
AU - Bussmann, Michael
AU - Huebl, Axel
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks.
AB - This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome and flexibly extended for in situ analysis. The openPMD-api is a library for the description of scientific data according to the Open Standard for Particle-Mesh Data (openPMD). Its approach towards recent challenges posed by hardware heterogeneity lies in the decoupling of data description in domain sciences, such as plasma physics simulations, from concrete implementations in hardware and IO. The streaming backend is provided by the ADIOS2 framework, developed at Oak Ridge National Laboratory. This paper surveys two openPMD-based loosely-coupled setups to demonstrate flexible applicability and to evaluate performance. In loose coupling, as opposed to tight coupling, two (or more) applications are executed separately, e.g. in individual MPI contexts, yet cooperate by exchanging data. This way, a streaming-based workflow allows for standalone codes instead of tightly-coupled plugins, using a unified streaming-aware API and leveraging high-speed communication infrastructure available in modern compute clusters for massive data exchange. We determine new challenges in resource allocation and in the need of strategies for a flexible data distribution, demonstrating their influence on efficiency and scaling on the Summit compute system. The presented setups show the potential for a more flexible use of compute resources brought by streaming IO as well as the ability to increase throughput by avoiding filesystem bottlenecks.
KW - Big Data
KW - High performance computing
KW - RDMA
KW - Streaming
UR - http://www.scopus.com/inward/record.url?scp=85127088037&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-96498-6_6
DO - 10.1007/978-3-030-96498-6_6
M3 - Conference contribution
AN - SCOPUS:85127088037
SN - 9783030964979
T3 - Communications in Computer and Information Science
SP - 99
EP - 118
BT - Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Revised Selected Papers
A2 - Nichols, [given-name]Jeffrey
A2 - Maccabe, [given-name]Arthur ‘Barney’
A2 - Nutaro, James
A2 - Pophale, Swaroop
A2 - Devineni, Pravallika
A2 - Ahearn, Theresa
A2 - Verastegui, Becky
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021
Y2 - 18 October 2021 through 20 October 2021
ER -