Abstract
Workflow engines used to script scientific experiments involving numerical simulation, data analysis, instruments, edge sensors, and artificial intelligence have to deal with the complexities of hardware, software, resource availability, and the collaborative nature of science. In this paper, we survey workflow engines used in data-intensive and compute-intensive discovery pipelines from scientific disciplines such as astronomy, high energy physics, earth system science, bio-medicine, and material science and present a qualitative analysis of their respective capabilities. We compare 5 popular workflow engines and their differentiated approach to job orchestration, job launching, data management and provenance, security authentication, ease-ofuse, workflow description, and scripting semantics. The comparisons presented in this paper allow practitioners to choose the appropriate engine for their scientific experiment and lead to recommendations for future work.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
Editors | Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 4553-4560 |
Number of pages | 8 |
ISBN (Electronic) | 9781728108582 |
DOIs | |
State | Published - Dec 2019 |
Externally published | Yes |
Event | 2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019 |
Publication series
Name | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|
Conference
Conference | 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|---|
Country/Territory | United States |
City | Los Angeles |
Period | 12/9/19 → 12/12/19 |
Funding
This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doepublic-access-plan). This project was partially funded by the Laboratory Director’s Research and Development fund. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy.
Keywords
- Converged Workloads
- Data Intensive Discoveries
- End-to-End Workflows
- Scientific Experiments
- Scientific Workflows
- Workflow Engines