Evaluating Scientific Workflow Engines for Data and Compute Intensive Discoveries

Rina Singh, Jeffrey A. Graves, Valentine Anantharaj, Sreenivas R. Sukumar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Workflow engines used to script scientific experiments involving numerical simulation, data analysis, instruments, edge sensors, and artificial intelligence have to deal with the complexities of hardware, software, resource availability, and the collaborative nature of science. In this paper, we survey workflow engines used in data-intensive and compute-intensive discovery pipelines from scientific disciplines such as astronomy, high energy physics, earth system science, bio-medicine, and material science and present a qualitative analysis of their respective capabilities. We compare 5 popular workflow engines and their differentiated approach to job orchestration, job launching, data management and provenance, security authentication, ease-ofuse, workflow description, and scripting semantics. The comparisons presented in this paper allow practitioners to choose the appropriate engine for their scientific experiment and lead to recommendations for future work.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4553-4560
Number of pages8
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Externally publishedYes
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period12/9/1912/12/19

Funding

This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doepublic-access-plan). This project was partially funded by the Laboratory Director’s Research and Development fund. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy.

FundersFunder number
UT-Battelle, LLC
U.S. Department of Energy
Office of Science

    Keywords

    • Converged Workloads
    • Data Intensive Discoveries
    • End-to-End Workflows
    • Scientific Experiments
    • Scientific Workflows
    • Workflow Engines

    Fingerprint

    Dive into the research topics of 'Evaluating Scientific Workflow Engines for Data and Compute Intensive Discoveries'. Together they form a unique fingerprint.

    Cite this