Stacker: An autonomic data movement engine for extreme-scale data staging-based in-situ workflows

Pradeep Subedi, Philip Davis, Shaohua Duan, Scott Klasky, Hemanth Kolla, Manish Parashar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

43 Scopus citations

Abstract

Data staging and in-situ workflows are being explored extensively as an approach to address data-related costs at very large scales. However, the impact of emerging storage architectures (e.g., deep memory hierarchies and burst buffers) upon data staging solutions remains a challenge. In this paper, we investigate how burst buffers can be effectively used by data staging solutions, for example, as a persistence storage tier of the memory hierarchy. Furthermore, we use machine learning based prefetching techniques to move data between the storage levels in an autonomous manner. We also present Stacker, a prototype of the proposed solutions implemented within the DataSpaces data staging service, and experimentally evaluate its performance and scalability using the S3D combustion workflow on current leadership class platforms. Our experiments demonstrate that Stacker achieves low latency, high volume data-staging with low overheads as compared to in-memory staging services for production scientific workflows.

Original languageEnglish
Title of host publicationProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages920-930
Number of pages11
ISBN (Electronic)9781538683842
DOIs
StatePublished - Jul 2 2018
Event2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018 - Dallas, United States
Duration: Nov 11 2018Nov 16 2018

Publication series

NameProceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018

Conference

Conference2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
Country/TerritoryUnited States
CityDallas
Period11/11/1811/16/18

Funding

ACKNOWLEDGEMENT We would like to thank all of the reviewers for their valuable feedback and comments. The research presented in this paper is based upon work by the RAPIDS Institute supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program and by the SIRIUS grant (number DE-SC0015160), and by the National Science Foundation (NSF) via grants number IIS 1546145. The research at Rutgers was conducted as part of the Rutgers Discovery Informatics Institute (RDI2).

Keywords

  • Data Prefetching
  • Extreme Scale Data Staging
  • High Performance Computing
  • Machine Learning

Fingerprint

Dive into the research topics of 'Stacker: An autonomic data movement engine for extreme-scale data staging-based in-situ workflows'. Together they form a unique fingerprint.

Cite this