TY - GEN
T1 - A cleanup algorithm for implementing storage constraints in scientific workflow executions
AU - Srinivasan, Sudarshan
AU - Juve, Gideon
AU - Da Silva, Rafael Ferreira
AU - Vahi, Karan
AU - Deelman, Ewa
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014
Y1 - 2014
N2 - Scientific workflows are often used to automate large-scale data analysis pipelines on clusters, grids, and clouds. However, because workflows can be extremely data-intensive, and are often executed on shared resources, it is critical to be able to limit or minimize the amount of disk space that workflows use on shared storage systems. This paper proposes a novel and simple approach that constrains the amount of storage space used by a workflow by inserting data cleanup tasks into the workflow task graph. Unlike previous solutions, the proposed approach provides guaranteed limits on disk usage, requires no new functionality in the underlying workflow scheduler, and does not require estimates of task runtimes. Experimental results show that this algorithm significantly reduces the number of cleanup tasks added to a workflow and yields better workflow makespans than the strategy currently used by the Pegasus Workflow Management System.
AB - Scientific workflows are often used to automate large-scale data analysis pipelines on clusters, grids, and clouds. However, because workflows can be extremely data-intensive, and are often executed on shared resources, it is critical to be able to limit or minimize the amount of disk space that workflows use on shared storage systems. This paper proposes a novel and simple approach that constrains the amount of storage space used by a workflow by inserting data cleanup tasks into the workflow task graph. Unlike previous solutions, the proposed approach provides guaranteed limits on disk usage, requires no new functionality in the underlying workflow scheduler, and does not require estimates of task runtimes. Experimental results show that this algorithm significantly reduces the number of cleanup tasks added to a workflow and yields better workflow makespans than the strategy currently used by the Pegasus Workflow Management System.
UR - https://www.scopus.com/pages/publications/84988222800
U2 - 10.1109/WORKS.2014.8
DO - 10.1109/WORKS.2014.8
M3 - Conference contribution
AN - SCOPUS:84988222800
T3 - Proceedings of WORKS 2014: The 9th Workshop on Workflows in Support of Large-Scale Science - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 41
EP - 49
BT - Proceedings of WORKS 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th Workshop on Workflows in Support of Large-Scale Science, WORKS 2014
Y2 - 16 November 2014
ER -