TY - GEN
T1 - Custom execution environments with containers in pegasus-enabled scientific workflows
AU - Vahi, Karan
AU - Zink, Michael
AU - Rynge, Mats
AU - Papadimitriou, George
AU - Brown, Duncan
AU - Mayani, Rajiv
AU - Ferreira Da Silva, Rafael
AU - Deelman, Ewa
AU - Mandal, Anirban
AU - Lyons, Eric
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Science reproducibility is a cornerstone feature in scientific workflows. In most cases, this has been implemented as a way to exactly reproduce the computational steps taken to reach the final results. While these steps are often completely described, including the input parameters, datasets, and codes, the environment in which these steps are executed is only described at a higher level with endpoints and operating system name and versions. Though this may be sufficient for reproducibility in the short term, systems evolve and are replaced over time, breaking the underlying workflow reproducibility. A natural solution to this problem is containers, as they are well defined, have a lifetime independent of the underlying system, and can be user-controlled so that they can provide custom environments if needed. This paper highlights some unique challenges that may arise when using containers in distributed scientific workflows. Further, this paper explores how the Pegasus Workflow Management System implements container support to address such challenges.
AB - Science reproducibility is a cornerstone feature in scientific workflows. In most cases, this has been implemented as a way to exactly reproduce the computational steps taken to reach the final results. While these steps are often completely described, including the input parameters, datasets, and codes, the environment in which these steps are executed is only described at a higher level with endpoints and operating system name and versions. Though this may be sufficient for reproducibility in the short term, systems evolve and are replaced over time, breaking the underlying workflow reproducibility. A natural solution to this problem is containers, as they are well defined, have a lifetime independent of the underlying system, and can be user-controlled so that they can provide custom environments if needed. This paper highlights some unique challenges that may arise when using containers in distributed scientific workflows. Further, this paper explores how the Pegasus Workflow Management System implements container support to address such challenges.
KW - Containers
KW - Distributed computing
KW - Docker
KW - Pegasus
KW - Reproducibility
KW - Scientific workflows
KW - Shifter
KW - Singularity
UR - http://www.scopus.com/inward/record.url?scp=85083168946&partnerID=8YFLogxK
U2 - 10.1109/eScience.2019.00039
DO - 10.1109/eScience.2019.00039
M3 - Conference contribution
AN - SCOPUS:85083168946
T3 - Proceedings - IEEE 15th International Conference on eScience, eScience 2019
SP - 281
EP - 290
BT - Proceedings - IEEE 15th International Conference on eScience, eScience 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on eScience, eScience 2019
Y2 - 24 September 2019 through 27 September 2019
ER -