TY - GEN
T1 - Community resources for enabling research in distributed scientific workflows
AU - Da Silva, Rafael Ferreira
AU - Chen, Weiwei
AU - Juve, Gideon
AU - Vahi, Karan
AU - Deelman, Ewa
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/2
Y1 - 2014/12/2
N2 - A significant amount of recent research in scientific workflows aims to develop new techniques, algorithms and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex distributed infrastructures. Since the infrastructures, systems and applications are complex, and their behavior is difficult to reproduce using physical experiments, much of this research is based on simulation. However, there exists a shortage of realistic datasets and tools that can be used for such studies. In this paper we describe a collection of tools and data that have enabled research in new techniques, algorithms, and systems for scientific workflows. These resources include: 1) execution traces of real workflow applications from which workflow and system characteristics such as resource usage and failure profiles can be extracted, 2) a synthetic workflow generator that can produce realistic synthetic workflows based on profiles extracted from execution traces, and 3) a simulator framework that can simulate the execution of synthetic workflows on realistic distributed infrastructures. This paper describes how we have used these resources to investigate new techniques for efficient and robust workflow execution, as well as to provide improvements to the Pegasus Workflow Management System or other workflow tools. Our goal in describing these resources is to share them with other researchers in the workflow research community. All of the tools and data are freely available online for the community at http://www.workflowarchive.org. These data have already been leveraged for a number of studies.
AB - A significant amount of recent research in scientific workflows aims to develop new techniques, algorithms and systems that can overcome the challenges of efficient and robust execution of ever larger workflows on increasingly complex distributed infrastructures. Since the infrastructures, systems and applications are complex, and their behavior is difficult to reproduce using physical experiments, much of this research is based on simulation. However, there exists a shortage of realistic datasets and tools that can be used for such studies. In this paper we describe a collection of tools and data that have enabled research in new techniques, algorithms, and systems for scientific workflows. These resources include: 1) execution traces of real workflow applications from which workflow and system characteristics such as resource usage and failure profiles can be extracted, 2) a synthetic workflow generator that can produce realistic synthetic workflows based on profiles extracted from execution traces, and 3) a simulator framework that can simulate the execution of synthetic workflows on realistic distributed infrastructures. This paper describes how we have used these resources to investigate new techniques for efficient and robust workflow execution, as well as to provide improvements to the Pegasus Workflow Management System or other workflow tools. Our goal in describing these resources is to share them with other researchers in the workflow research community. All of the tools and data are freely available online for the community at http://www.workflowarchive.org. These data have already been leveraged for a number of studies.
KW - Scientific Workflows
KW - Workflow Simulation
KW - Workload Archive
KW - Workload Profiling and Characterization
UR - https://www.scopus.com/pages/publications/84919499464
U2 - 10.1109/eScience.2014.44
DO - 10.1109/eScience.2014.44
M3 - Conference contribution
AN - SCOPUS:84919499464
T3 - Proceedings - 2014 IEEE 10th International Conference on eScience, eScience 2014
SP - 177
EP - 184
BT - Proceedings - 2014 IEEE 10th International Conference on eScience, eScience 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE International Conference on eScience, eScience 2014
Y2 - 20 October 2014 through 24 October 2014
ER -