A Versatile Simulated Data Transport Layer for in Situ Workflows Performance Evaluation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In situ processing does not only allow scientific applications to face the explosion in data volume and velocity but also to address the time constraints of many simulation-analysis workflows by providing scientists with early insights about their applications at runtime. Multiple frameworks implement the concept of a data transport layer (DTL) to enable such in situ workflows. These tools are very versatile, directly or indirectly access the data generated on the same node, another node of the same compute cluster, or a completely distinct node, and allow data publishers and subscribers to run on the same computing resources or not. This versatility puts on researchers the onus of taking key decisions related to resource allocation and how to transport data to ensure the most efficient execution of their in situ workflows. However, domain scientists and workflow practitioners lack the appropriate tools to assess the respective performance of particular design and deployment options. In this paper we introduce a versatile simulated DTL designed to provide researchers with insights on the respective performance of different execution scenarios of in situ workflows. This open-source, standalone library builds on the SimGrid toolkit and can be linked to any SimGrid-based simulator. It facilitates the evaluation of the performance behavior, at scale, of different data transport configurations and the study of the effects of resource allocation strategies. We demonstrate the scalability, versatility, and accuracy of this simulated DTL by reproducing the execution of two synthetic benchmarks and of a real-world in situ workflow composed of an MPI application and a parallel data analysis. Results of simulations run on a single core show that the proposed library can simulate the interactions of tens of thousands of simulated processes deployed on two interconnected commodity clusters in a few seconds, and the execution by a thousand simulated processes of an in situ workflow in less than three minutes.

Original languageEnglish
Title of host publicationProceedings of the 2025 IEEE International Conference on Cluster Computing, CLUSTER 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331530198
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Cluster Computing, CLUSTER 2025 - Edinburgh, United Kingdom
Duration: Sep 3 2025Sep 5 2025

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Conference

Conference2025 IEEE International Conference on Cluster Computing, CLUSTER 2025
Country/TerritoryUnited Kingdom
CityEdinburgh
Period09/3/2509/5/25

Funding

This research is partially supported by U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number "ERKJ414 - Resilient Federated Workflows in a Heterogeneous Computing Environment" and Laboratory Directed Research and Development Strategic Hire funding No. 11134 from Oak Ridge National Laboratory, provided by the Director. Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation.

Fingerprint

Dive into the research topics of 'A Versatile Simulated Data Transport Layer for in Situ Workflows Performance Evaluation'. Together they form a unique fingerprint.

Cite this