TY - GEN
T1 - Improving Multisite Workflow Performance Using Model-Based Scheduling
AU - Maheshwari, Ketan
AU - Jung, Eun Sung
AU - Meng, Jiayuan
AU - Vishwanath, Venkatram
AU - Kettimuthu, Rajkumar
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/11/13
Y1 - 2014/11/13
N2 - Workflows play an important role in expressing and executing scientific applications. In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are geographically distributed. These computational sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naive approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity for a scientist. In this paper, we propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on different resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications in a distributed environment using the Swift distributed execution framework and show that our approach improves the execution time by up to 60% compared to the default schedule.
AB - Workflows play an important role in expressing and executing scientific applications. In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are geographically distributed. These computational sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naive approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity for a scientist. In this paper, we propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on different resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications in a distributed environment using the Swift distributed execution framework and show that our approach improves the execution time by up to 60% compared to the default schedule.
KW - Clouds
KW - Distributed computing
KW - Parallel programming
KW - Resource modeling
KW - Scripting
KW - Swift
UR - http://www.scopus.com/inward/record.url?scp=84932632912&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2014.22
DO - 10.1109/ICPP.2014.22
M3 - Conference contribution
AN - SCOPUS:84932632912
T3 - Proceedings of the International Conference on Parallel Processing
SP - 131
EP - 140
BT - Proceedings - 43rd International Conference on Parallel Processing, ICPP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 43rd International Conference on Parallel Processing, ICPP 2014
Y2 - 9 September 2014 through 12 September 2014
ER -