Abstract
Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short running tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In this work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Then, we propose quantitative metrics to evaluate the severity of the two imbalance problems. Furthermore, we propose a series of task balancing methods (horizontal and vertical) to address the load balance problem when performing task clustering for five widely used scientific workflows. Finally, we analyze the relationship between these metric values and the performance of proposed task balancing methods. A trace-based simulation shows that our methods can significantly decrease the runtime of workflow applications when compared to a baseline execution. We also compare the performance of our methods with two algorithms described in the literature.
Original language | English |
---|---|
Pages (from-to) | 69-84 |
Number of pages | 16 |
Journal | Future Generation Computer Systems |
Volume | 46 |
DOIs | |
State | Published - May 2015 |
Externally published | Yes |
Funding
This work was funded by NSF IIS-0905032 and NSF FutureGrid 0910812 awards. We thank Gideon Juve, Karan Vahi, Rajiv Mayani, and Mats Rynge for their valuable help.
Keywords
- Load balancing
- Performance analysis
- Scheduling
- Scientific workflows
- Task clustering
- Workflow simulation