Modeling the Performance of Scientific Workflow Executions on HPC Platforms with Burst Buffers

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

Scientific domains ranging from bioinformatics to astronomy and earth science rely on traditional high-performance computing (HPC) codes, often encapsulated in scientific workflows. In contrast to traditional HPC codes that employ a few programming and runtime approaches that are highly optimized for HPC platforms, scientific workflows are not necessarily optimized for these platforms. As an effort to reduce the gap between compute and I/O performance, HPC platforms have adopted intermediate storage layers known as burst buffers. A burst buffer (BB) is a fast storage layer positioned between the global parallel file system and the compute nodes. Two designs currently exist: (i) shared, where the BBs are located on dedicated nodes; and (ii) on-node, in which each compute node embeds a private BB. In this paper, using accurate simulations and realworld experiments, we study how to best use these new storage layers when executing scientific workflows. These applications are not necessarily optimized to run on HPC systems, and thus can exhibit I/O patterns that differ from that of HPC codes. Thus, we first characterize the I/O behaviors of a real-world workflow under different configuration scenarios on two leadership-class HPC systems (Cori at NERSC and Summit at ORNL). Then, we use these characterizations to calibrate a simulator for workflow executions on HPC systems featuring shared and private BBs. Last, we evaluate our approach against a large I/O-intensive workflow, and we provide insights on the performance levels and the potential limitations of these two BBs architectures.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Cluster Computing, CLUSTER 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages92-103
Number of pages12
ISBN (Electronic)9781728166773
DOIs
StatePublished - Sep 2020
Externally publishedYes
Event22nd IEEE International Conference on Cluster Computing, CLUSTER 2020 - Kobe, Japan
Duration: Sep 14 2020Sep 17 2020

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2020-September
ISSN (Print)1552-5244

Conference

Conference22nd IEEE International Conference on Cluster Computing, CLUSTER 2020
Country/TerritoryJapan
CityKobe
Period09/14/2009/17/20

Funding

This work is funded by DOE contract number #DESC0012636; and partly funded by NSF contracts #1664162, #1741040, #1923539, and #1923621. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We would also like to thank C. Daley and L. Ramakrishnan from NERSC for their help with the SWarp workflow.

Keywords

  • Burst buffers
  • High-Performance Computing
  • Performance Characterization
  • Scientific workflow
  • Simulations

Fingerprint

Dive into the research topics of 'Modeling the Performance of Scientific Workflow Executions on HPC Platforms with Burst Buffers'. Together they form a unique fingerprint.

Cite this