Abstract
We observe the emergence of a new generation of scientific workflows that process data produced at a sustained rate by scientific instruments and large scale numerical simulations. This data is consumed by multiple analysis, visualization, or Machine Learning components not only to enable inference and justify the scientific program, but also to monitor and steer the evolution of these experiments. In such workflows, moving intermediate data efficiently is key to performance, more than efficiently scheduling computational tasks. However, most traditional workflow management systems focus on optimizing task scheduling and then deal with data management, assuming a 'move little, compute for long' model, which makes them unfit to the efficient management of this new generation of workflows. Therefore, we advocate for a new way to manage scientific workflows. We propose to consider an efficiently and independently managed data plane that can store and stream data. Workflows compute components, in the application plane can then interact with the data plane, abstracted from complexities of data management. Then, the role of a workflow management system would become that of a control plane that allows users to connect services together to execute the workflow and manages connections between the application and data planes. In this position paper, we characterize several next-generation workflow motifs and describe how their interaction with the data plane is a challenge to traditional workflow management systems. Then, we express a set of requirements that a workflow management system should meet to efficiently manage next-generation workflows at different scales. Based on these requirements, we expose our vision of driving next-generation workflows from the data plane and list remaining open challenges.
Original language | English |
---|---|
Title of host publication | Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350322231 |
DOIs | |
State | Published - 2023 |
Event | 19th IEEE International Conference on e-Science, e-Science 2023 - Limassol, Cyprus Duration: Oct 9 2023 → Oct 14 2023 |
Publication series
Name | Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023 |
---|
Conference
Conference | 19th IEEE International Conference on e-Science, e-Science 2023 |
---|---|
Country/Territory | Cyprus |
City | Limassol |
Period | 10/9/23 → 10/14/23 |
Funding
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research is partially supported by Laboratory Directed Research and Development Strategic Hire funding No. 11134 from Oak Ridge National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy.
Keywords
- Workflow management
- data management