Workflow performance improvement using model-based scheduling over multiple clusters and clouds

Ketan Maheshwari, Eun Sung Jung, Jiayuan Meng, Vitali Morozov, Venkatram Vishwanath, Rajkumar Kettimuthu

Research output: Contribution to journalArticlepeer-review

31 Scopus citations

Abstract

In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are distributed. These sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site capped by administrative policies. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naïve approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity. We propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications using the Swift parallel and distributed execution framework. We use two distinct computational environments-geographically distributed multiple clusters and multiple clouds. We show that our approach improves the resource utilization and reduces execution time when compared to the default schedule.

Original languageEnglish
Pages (from-to)206-218
Number of pages13
JournalFuture Generation Computer Systems
Volume54
DOIs
StatePublished - Jan 1 2016
Externally publishedYes

Funding

We thank Gail Pieper of Argonne for proofreading help. This work was supported in part by the US Department of Energy , Office of Science , Advanced Scientific Computing Research , and the RAMSES project under Contract DE-AC02-06CH11357.

FundersFunder number
US Department of Energy
National Science Foundation1440785
Office of Science
Advanced Scientific Computing ResearchDE-AC02-06CH11357

    Keywords

    • Clouds
    • Optimization
    • Swift
    • System modeling
    • Workflow

    Fingerprint

    Dive into the research topics of 'Workflow performance improvement using model-based scheduling over multiple clusters and clouds'. Together they form a unique fingerprint.

    Cite this