In-memory staging and data-centric task placement for coupled scientific simulation workflows

Fan Zhang, Tong Jin, Qian Sun, Melissa Romanus, Hoang Bui, Scott Klasky, Manish Parashar

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Coupled scientific simulation workflows are composed of heterogeneous component applications that simulate different aspects of the physical phenomena being modeled and that interact and exchange significant volumes of data at runtime. As the data volumes and generation rates keep growing, the traditional disk I/O–based data movement approach becomes cost prohibitive, and workflow requires more scalable and efficient approach to support the data movement. Moreover, the cost of moving large volume of data over system interconnection network becomes dominating and significantly impacts the workflow execution time. Minimize the amount of network data movement and localize data transfers are critical for reducing such cost. To achieve this, workflow task placement should exploit data locality to the extent possible and move computation closer to data. In this paper, we investigate applying in-memory data staging and data-centric task placement to reduce the data movement cost in large-scale coupled simulation workflows. Specifically, we present a distributed data sharing and task execution framework that (1) co-locates in-memory data staging on application compute nodes to store data that needs to be shared or exchanged and (2) uses data-centric task placement to map computations onto processor cores that a large portion of the data exchanges can be performed using the intra-node shared memory. We also present the implementation of the framework and its experimental evaluation on Titan Cray XK7 petascale supercomputer.

Original languageEnglish
Article numbere4147
JournalConcurrency and Computation: Practice and Experience
Volume29
Issue number12
DOIs
StatePublished - Jun 25 2017

Funding

The research presented in this work is supported in part by National Science Foundation (NSF) via grant numbers ACI 1339036, ACI 1310283, CNS 1305375, and DMS 1228203, by the Office of Advanced Scientific Computing Research, Office of Science, of the US Department of Energy through the SciDAC Institute for Scalable Data Management, Analysis and Visualization (SDAV) under award number DE-SC0007455, RSVP award via subcontract number 4000126989 from UT Battelle, the ASCR and FES Partnership for Edge Physics Simulations (EPSI) under award number DE-FG02-06ER54857, and the ExaCT Combustion Co-Design Center via subcontract number 4000110839 from UT Battelle. The research at Rutgers was conducted as part of the Rutgers Discovery Informatics Institute (RDI2).

FundersFunder number
EPSIDE-FG02-06ER54857
ExaCT Combustion Co-Design Center4000110839
FES Partnership for Edge Physics Simulations
National Science FoundationDMS 1228203, CNS 1305375, ACI 1310283, ACI 1339036
U.S. Department of EnergyDE-SC0007455
Office of Science
Advanced Scientific Computing Research
Research Society for Victorian Periodicals4000126989
UT-Battelle

    Keywords

    • coupled simulations
    • data staging
    • data-centric task mapping
    • data-intensive application workflows

    Fingerprint

    Dive into the research topics of 'In-memory staging and data-centric task placement for coupled scientific simulation workflows'. Together they form a unique fingerprint.

    Cite this