Abstract
There has been a recent emergence of new workflow applications focused on data analytics and machine learning. This emergence has precipitated a change in the workflow management landscape, causing the development of new dataoriented workflow management systems (WMSs) in addition to the earlier standard of task-oriented WMSs. In this paper, we summarize three general workflow use-cases and explore the unique requirements of each use-case in order to understand how WMSs from both workflow management models meet the requirements of each workflow use-case from the user's perspective. We analyze the applicability of the two models by carefully describing each model and by providing an examination of the different variations of WMSs that fall under the task driven model. To illustrate the strengths and weaknesses of each workflow management model, we summarize the key features of four production-ready WMSs: Pegasus, Makeflow, Apache Airflow, and Pachyderm. To deepen our analysis of the four WMSs examined in this paper,we implement three real-world use-cases to highlight the specifications and features of each WMS. We present our final assessment of each WMS after considering the following factors: usability, performance, ease of deployment, and relevance. The purpose of this work is to offer insights from the user's perspective into the research challenges that WMSs currently face due to the evolving workflow landscape.
Original language | English |
---|---|
Title of host publication | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
Editors | Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 4537-4544 |
Number of pages | 8 |
ISBN (Electronic) | 9781728108582 |
DOIs | |
State | Published - Dec 2019 |
Event | 2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019 |
Publication series
Name | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|
Conference
Conference | 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|---|
Country/Territory | United States |
City | Los Angeles |
Period | 12/9/19 → 12/12/19 |
Funding
Acknowledgments. This work is funded by NSF contract #1842042: “Pilot Study for a Cyberinfrastructure Center of Excellence”. The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute.
Keywords
- Data-driven.
- Scientific workflow
- Task-driven
- Workflow Management System