Abstract
Science applications frequently produce and consume large volumes of data, but delivering this data to and from compute resources can be challenging, as parallel file system performance is not keeping up with compute and memory performance. To mitigate this I/O bottleneck, some systems have deployed burst buffers, but their impact on performance for real-world scientific workflow applications is still not clear. In this paper, we examine the impact of burst buffers through the remote-shared, allocatable burst buffers on the Cori system at NERSC. By running two data-intensive workflows, a high-throughput genome analysis workflow, and a subset of the SCEC high-performance CyberShake workflow, a production seismic hazard analysis workflow, we find that using burst buffers offers read and write improvements of an order of magnitude, and these improvements lead to increased job performance, and thereby increased overall workflow performance, even for long-running CPU-bound jobs.
Original language | English |
---|---|
Pages (from-to) | 208-220 |
Number of pages | 13 |
Journal | Future Generation Computer Systems |
Volume | 101 |
DOIs | |
State | Published - Dec 2019 |
Externally published | Yes |
Funding
This work was funded by DOE contract number #DESC0012636, “Panorama – Predictive Modeling and Diagnostic Monitoring of Extreme Science Workflows”, by NSF, USA contract number #1664162, “ SI2-SSI: Pegasus: Automating Compute and Data Intensive Science”, and by NSF contract number #1741040, “BIGDATA: IA: Collaborative Research: In Situ Data Analytics for Next Generation Molecular Dynamics Workflows”. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy, United States under Contract No. DE-AC02-05CH11231. CyberShake workflow research was supported by the National Science Foundation (NSF), USA under the OAC SI2-SSI grant #1148493, the OAC SI2-SSI grant #1450451, and EAR grant #1226343. This research was supported by the Southern California Earthquake Center, USA (Contribution No. 7610). SCEC is funded by NSF Cooperative Agreement EAR-1033462 & USGS Cooperative Agreement G12AC20038. CyberShake workflow research was supported by the National Science Foundation (NSF), USA under the OAC SI2-SSI grant #1148493 , the OAC SI2-SSI grant #1450451 , and EAR grant #1226343 . This research was supported by the Southern California Earthquake Center, USA (Contribution No. 7610 ). SCEC is funded by NSF Cooperative Agreement EAR-1033462 & USGS Cooperative Agreement G12AC20038 . This work was funded by DOE contract number #DESC0012636 , “Panorama – Predictive Modeling and Diagnostic Monitoring of Extreme Science Workflows”, by NSF, USA contract number #1664162 , “ SI2-SSI: Pegasus: Automating Compute and Data Intensive Science”, and by NSF contract number #1741040, “BIGDATA: IA: Collaborative Research: In Situ Data Analytics for Next Generation Molecular Dynamics Workflows”. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy, United States under Contract No. DE-AC02-05CH11231 .
Funders | Funder number |
---|---|
DOE Office of Science | |
National Science Foundation | EAR-1033462, 1664162, 1841758, 1741040, 1450451, G12AC20038, 1148493 |
U.S. Department of Energy | DE-AC02-05CH11231, #DESC0012636, 0012636 |
Division of Earth Sciences | 1226343 |
U.S. Geological Survey | |
Office of Science | |
Southern California Earthquake Center | 7610 |
Keywords
- Burst buffers
- High-performance computing
- In transit processing
- Scientific workflows