Persistent data staging services for data intensive in-situ scientific workflows

Melissa Romanus, Fan Zhang, Tong Jin, Qian Sun, Hoang Bui, Manish Parashar, Jong Choi, Saloman Janhunen, Robert Hager, Scott Klasky, Choong Seock Chang, Ivan Rodero

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability.

Original languageEnglish
Title of host publicationDIDC 2016 - Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing
PublisherAssociation for Computing Machinery, Inc
Pages37-44
Number of pages8
ISBN (Electronic)9781450343527
DOIs
StatePublished - Jun 1 2016
Event6th ACM International Workshop on Data-Intensive Distributed Computing, DIDC 2016 - Kyoto, Japan
Duration: Jun 1 2016 → …

Publication series

NameDIDC 2016 - Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing

Conference

Conference6th ACM International Workshop on Data-Intensive Distributed Computing, DIDC 2016
Country/TerritoryJapan
CityKyoto
Period06/1/16 → …

Funding

The research presented in this work is supported in part by National Science Foundation (NSF) via grant numbers CNS 1305375, ACI 1339036, ACI 1310283, ACI 1441376, ACI 1464317 and IIS 1546145, and by the Director, Office of Advanced Scientific Computing Research, Office of Science, of the US Department of Energy Scientific Discovery through Advanced Computing (SciDAC) Institute for Scalable Data Management, Analysis and Visualization (SDAV) under award number DE-SC0007455, the DoE RSVP grant via subcontract number 4000126989 from UT Battelle, the Advanced Scientific Computing Research and Fusion Energy Sciences Partnership for Edge Physics Simulations (EPSI) under award number DE-FG02-06ER54857, the ExaCT Combustion Co-Design Center via subcontract number 4000110839 from UT Battelle, and via grant number DE-FOA-0001338, Storage Systems and Input/Output for Extreme Scale Science. The research at Rutgers was conducted as part of the Rutgers Discovery Informatics Institute (RDI2).

FundersFunder number
Advanced Scientific Computing Research and Fusion Energy Sciences Partnership for Edge Physics Simulations
DoE RSVP4000126989
EPSIDE-FG02-06ER54857
ExaCT Combustion Co-Design Center4000110839, DE-FOA-0001338
US Department of EnergyDE-SC0007455
National Science Foundation1546145, CNS 1305375, 1464317, ACI 1310283, ACI 1339036, IIS 1546145, 1339036, 1310283, ACI 1441376, ACI 1464317, 1305375
Battelle
Office of Science
Advanced Scientific Computing Research

    Fingerprint

    Dive into the research topics of 'Persistent data staging services for data intensive in-situ scientific workflows'. Together they form a unique fingerprint.

    Cite this