TY - GEN
T1 - Toward fine-grained online task characteristics estimation in scientific workflows
AU - Da Silva, Rafael Ferreira
AU - Juve, Gideon
AU - Deelman, Ewa
AU - Glatard, Tristan
AU - Desprez, Frédéric
AU - Thain, Douglas
AU - Tovar, Benjamín
AU - Livny, Miron
PY - 2013/11/17
Y1 - 2013/11/17
N2 - Task characteristics estimations such as runtime, disk space, and memory consumption, are commonly used by scheduling algorithms and resource provisioning techniques to provide successful and efficient work ow executions. These methods assume that accurate estimations are available, but in production systems it is hard to compute such estimates with good accuracy. In this work, we first profile three real scientific workflows collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task needs based on these profiles. Our method estimates task runtime, disk space, and memory consumption based on the size of tasks input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets by using a clustering technique. Task behavior estimates are done based on the ratio parameter/input data size if they are correlated, or based on the mean value. However, task dependencies in scientific workflows lead to a chain of estimation errors. To correct such errors, we propose an online estimation process based on the MAPE-K loop where task executions are constantly monitored and estimates are updated accordingly. Experiment results show that our online estimation process yields much more accurate predictions than an offline approach, where all task needs are estimated at once.
AB - Task characteristics estimations such as runtime, disk space, and memory consumption, are commonly used by scheduling algorithms and resource provisioning techniques to provide successful and efficient work ow executions. These methods assume that accurate estimations are available, but in production systems it is hard to compute such estimates with good accuracy. In this work, we first profile three real scientific workflows collecting fine-grained information such as process I/O, runtime, memory usage, and CPU utilization. We then propose a method to automatically characterize workflow task needs based on these profiles. Our method estimates task runtime, disk space, and memory consumption based on the size of tasks input data. It looks for correlations between the parameters of a dataset, and if no correlation is found, the dataset is divided into smaller subsets by using a clustering technique. Task behavior estimates are done based on the ratio parameter/input data size if they are correlated, or based on the mean value. However, task dependencies in scientific workflows lead to a chain of estimation errors. To correct such errors, we propose an online estimation process based on the MAPE-K loop where task executions are constantly monitored and estimates are updated accordingly. Experiment results show that our online estimation process yields much more accurate predictions than an offline approach, where all task needs are estimated at once.
KW - MAPE-K loop
KW - Online task estimation
KW - Scientific workflow
KW - Workflow characterization
UR - https://www.scopus.com/pages/publications/84997796211
U2 - 10.1145/2534248.2534254
DO - 10.1145/2534248.2534254
M3 - Conference contribution
AN - SCOPUS:84997796211
T3 - Proceedings of WORKS 2013: 8th Workshop on Workflows in Support of Large-Scale Science - Held in conjunction with SC 2013: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 58
EP - 67
BT - Proceedings of WORKS 2013
PB - Association for Computing Machinery, Inc
T2 - 8th Workshop on Workflows in Support of Large-Scale Science, WORKS 2013
Y2 - 17 November 2013
ER -