Accelerating Flash-X Simulations with Asynchronous I/O

Rajeev Jain, Houjun Tang, Akash Dhruv, J. Austin Harris, Suren Byna

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Most high-fidelity physics simulation codes, such as Flash-X, need to save intermediate results (checkpoint files) to restart or gain insights into the evolution of the simulation. These simulation codes save such intermediate files synchronously, where computation is stalled while the data is written to storage. Depending on the problem size and computational requirements, this file write time can be a substantial portion of the total simulation time. In order to hide the I/O latency of checkpointing, asynchronous I/O methods have been introduced. These methods use background threads for performing I/O while the main threads continue with the simulation. The usage of background threads can compete for resources on the node as well as with communication. In this paper, we evaluate the overheads and the overall benefit of asynchronous I/O in HDF5 to simulations. Results from real-world high-fidelity simulations on the Summit supercomputer show that I/O operation is overlapped with application communication or computation or both, effectively hiding some or all of the I/O latency. Our evaluation shows that while using asynchronous I/O adds overhead to the application, the I/O time reduction is more significant, resulting in overall up to 1.5X performance speedup.

Original languageEnglish
Title of host publicationProceedings of PDSW 2022
Subtitle of host publication7th International Parallel Data Systems Workshop, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages13-19
Number of pages7
ISBN (Electronic)9781665475624
DOIs
StatePublished - 2022
Event7th IEEE/ACM International Parallel Data Systems Workshop, PDSW 2022 - Dallas, United States
Duration: Nov 13 2022Nov 18 2022

Publication series

NameProceedings of PDSW 2022: 7th International Parallel Data Systems Workshop, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference7th IEEE/ACM International Parallel Data Systems Workshop, PDSW 2022
Country/TerritoryUnited States
CityDallas
Period11/13/2211/18/22

Funding

ACKNOWLEDGMENT This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaboratvi e effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.This material is based upon work supported by the U.S. Department of Energy, Office of Science, under contract number DE-AC02-06CH11357.

Keywords

  • FlashX AsyncIO HDF5

Fingerprint

Dive into the research topics of 'Accelerating Flash-X Simulations with Asynchronous I/O'. Together they form a unique fingerprint.

Cite this