Abstract
The term “scientific workflow” has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow systems have been developed to manage and run these scientific workflows. However, no turnkey solution has emerged from the field to address the diversity of scientific processes and the infrastructure on which they are supposed to be implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new workflow system. A direct consequence of this situation is that many existing workflow management systems (WMSs) share some salient features, offer similar functionalities, and can manage the same categories of workflows but at the same time also have some distinct capabilities that can be important for specific applications. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the computing and storage resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow structure and characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.
| Original language | English |
|---|---|
| Article number | 107974 |
| Journal | Future Generation Computer Systems |
| Volume | 174 |
| DOIs | |
| State | Published - Jan 2026 |
Funding
This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. BSC authors acknowledge projects CEX2021-001148-S and PID2023-147979NB-C21 from the MCIN/AEI and MICIU/AEI/10.13039/501100011033 and by FEDER, UE, and by the Departament de Recerca i Universitats de la Generalitat de Catalunya, research group MPiEDist (2021 SGR 00412). Ewa Deelman is funded by the U.S. Department of Energy, United States under grant No. DE-SC0024387 and by the U.S. National Science Foundation under grant No. 2138286 . This work was performed under the auspices of the US Department of Energy (DOE) by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This work has been supported by the LDRD at Lawrence Livermore National Laboratory ( 24-SI-005 ). Giovanni Pizzi acknowledges financial support from the NCCR MARVEL , a National Centre of Competence in Research , funded by the Swiss National Science Foundation, Switzerland (grant number 205602 ), by the Open Research Data Program of the ETH Board (project “PREMISE”: Open and Reproducible Materials Science Research) and by the SwissTwins project, funded by the Swiss State Secretariat for Education, Research and Innovation (SERI). Bartosz Balis is funded by the European Union through the Horizon Europe CLOUDSTARS project (101086248). Douglas Thain acknowledges support from National Science Foundation, United States Grant OCI-2411436 . Thain, Chard, Jha, and da Silva acknowledge support from National Science Foundation, United States grant TIP-2346119 . The authors express their deepest appreciation for the insightful review and comments from Khalid Belhajjame of the University Paris-Dauphine (France); Luiz Gadelha of the German Cancer Research Center (DKFZ, Germany); Johan Gustafsson of Australian BioCommons and Sehrish Kanwal of the Centre for Cancer Research at the University of Melbourne (Australia); and Mahnoor Zulfiqar and Stuart Owen of the University of Manchester (United Kingdom). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. BSC authors acknowledge projects CEX2021-001148-S and PID2023-147979NB-C21 from the MCIN/AEI and MICIU/AEI/10.13039/501100011033 and by FEDER, UE, and by the Departament de Recerca i Universitats de la Generalitat de Catalunya, research group MPiEDist (2021 SGR 00412). Ewa Deelman is funded by the U.S. Department of Energy, United States under grant No. DE-SC0024387 and by the U.S. National Science Foundation under grant No. 2138286. This work was performed under the auspices of the US Department of Energy (DOE) by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. This work has been supported by the LDRD at Lawrence Livermore National Laboratory (24-SI-005). Giovanni Pizzi acknowledges financial support from the NCCR MARVEL, a National Centre of Competence in Research, funded by the Swiss National Science Foundation, Switzerland (grant number 205602), by the Open Research Data Program of the ETH Board (project “PREMISE”: Open and Reproducible Materials Science Research) and by the SwissTwins project, funded by the Swiss State Secretariat for Education, Research and Innovation (SERI). Bartosz Balis is funded by the European Union through the Horizon Europe CLOUDSTARS project (101086248). Douglas Thain acknowledges support from National Science Foundation, United States Grant OCI-2411436. Thain, Chard, Jha, and da Silva acknowledge support from National Science Foundation, United States grant TIP-2346119. This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doepublic-access-plan ).
Keywords
- Community-based terminology
- Scientific workflows
- Workflow management systems