Abstract
The promise of an easy access to a virtually unlimited number of resources makes Infrastructure as a Service Clouds a good candidate for the execution of data-intensive workflow applications composed of hundreds of computational tasks. Thanks to a careful execution planning, workflow management systems can build a tailored compute infrastructure by combining a set of virtual machine instances. However, these applications usually rely on files to handle dependencies between tasks. A storage space shared by all virtual machines may become a bottleneck and badly impact the application execution time. In this article, we propose an original data-aware planning algorithm that leverages two characteristics of a family of virtual machines instances, that is, a large number of cores and a dedicated storage space on fast SSD drives, to improve data locality, hence reducing the amount of data transfers over the network during the execution of a workflow. We also propose a simulation-driven approach to solve a cost-performance optimization problem and correctly dimension the virtual infrastructure onto which execute a given workflow. Experiments conducted with real application workflows show the benefits of the presented algorithms. The data-aware planning leads to a clear reduction of both execution time and volume of data transferred over the network while the simulation-driven approach allows us to dimension the infrastructure in a reasonable time.
Original language | English |
---|---|
Article number | e6719 |
Journal | Concurrency and Computation: Practice and Experience |
Volume | 34 |
Issue number | 14 |
DOIs | |
State | Published - Jun 25 2022 |
Externally published | Yes |
Funding
The authors would like to thank Rafael Ferreira da Silva, Henri Casanova, and all the WRENCH development team for their valuable help in the design of the proposed WRENCH-based simulator.
Keywords
- IaaS cloud
- data-intensive workflows
- makespan reduction
- workflow scheduling