Using imbalance metrics to optimize task clustering in scientific workflow executions

Weiwei Chen, Rafael Ferreira Da Silva, Ewa Deelman, Rizos Sakellariou

Research output: Contribution to journalArticlepeer-review

52 Scopus citations

Abstract

Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short running tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In this work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Then, we propose quantitative metrics to evaluate the severity of the two imbalance problems. Furthermore, we propose a series of task balancing methods (horizontal and vertical) to address the load balance problem when performing task clustering for five widely used scientific workflows. Finally, we analyze the relationship between these metric values and the performance of proposed task balancing methods. A trace-based simulation shows that our methods can significantly decrease the runtime of workflow applications when compared to a baseline execution. We also compare the performance of our methods with two algorithms described in the literature.

Original languageEnglish
Pages (from-to)69-84
Number of pages16
JournalFuture Generation Computer Systems
Volume46
DOIs
StatePublished - May 2015
Externally publishedYes

Funding

This work was funded by NSF IIS-0905032 and NSF FutureGrid 0910812 awards. We thank Gideon Juve, Karan Vahi, Rajiv Mayani, and Mats Rynge for their valuable help.

Keywords

  • Load balancing
  • Performance analysis
  • Scheduling
  • Scientific workflows
  • Task clustering
  • Workflow simulation

Fingerprint

Dive into the research topics of 'Using imbalance metrics to optimize task clustering in scientific workflow executions'. Together they form a unique fingerprint.

Cite this