Understanding the Impact of Data Staging for Coupled Scientific Workflows

Ana Gainaru, Lipeng Wan, Ruonan Wang, Eric Suchyta, Jieyang Chen, Norbert Podhorszki, James Kress, David Pugmire, Scott Klasky

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

The rate of data generated by cutting-edge experimental science facilities and large-scale simulations enabled by current high-performance computing (HPC) systems has continued to grow at a far greater pace than the development of the network and storage capabilities on which these systems rely. To cope with this challenge, scientist are moving toward the creation of autonomous experiments and HPC simulations using machine learning. However, efficiently moving, storing, and processing large amounts of data away from the point of origin presents an incredible challenge. In-memory computing, in situ analysis, data staging, and data streaming are recognized viable alternatives to traditional file-based methods for transferring data between coupled workflows. However, the performance trade-offs and limitations for these methods are not fully understood when used in HPC applications. This article presents a comprehensive performance assessment of the current solutions for data staging when applied to applications that are not necessary I/O intensive which makes them not ideal candidates for these methods. Our study is based on experiments running at scale on Oak Ridge National Laboratory's Summit supercomputer using applications and simulations that cover typical computational motifs and patterns. We investigated the usability and cost/benefit trade-offs of staging algorithms for HPC applications under different scenarios and highlight opportunities for optimizing the dataflow between coupled simulation workflows.

Original languageEnglish
Pages (from-to)4134-4147
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume33
Issue number12
DOIs
StatePublished - Dec 1 2022

Keywords

  • Data staging
  • coupled simulations
  • data management
  • data streaming
  • high-performance computing
  • in situ analytics
  • workflow

Fingerprint

Dive into the research topics of 'Understanding the Impact of Data Staging for Coupled Scientific Workflows'. Together they form a unique fingerprint.

Cite this