Data-aware and simulation-driven planning of scientific workflows on IaaS clouds

Tchimou N'Takpé, Jean Edgard Gnimassoun, Souleymane Oumtanaga, Frédéric Suter

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The promise of an easy access to a virtually unlimited number of resources makes Infrastructure as a Service Clouds a good candidate for the execution of data-intensive workflow applications composed of hundreds of computational tasks. Thanks to a careful execution planning, workflow management systems can build a tailored compute infrastructure by combining a set of virtual machine instances. However, these applications usually rely on files to handle dependencies between tasks. A storage space shared by all virtual machines may become a bottleneck and badly impact the application execution time. In this article, we propose an original data-aware planning algorithm that leverages two characteristics of a family of virtual machines instances, that is, a large number of cores and a dedicated storage space on fast SSD drives, to improve data locality, hence reducing the amount of data transfers over the network during the execution of a workflow. We also propose a simulation-driven approach to solve a cost-performance optimization problem and correctly dimension the virtual infrastructure onto which execute a given workflow. Experiments conducted with real application workflows show the benefits of the presented algorithms. The data-aware planning leads to a clear reduction of both execution time and volume of data transferred over the network while the simulation-driven approach allows us to dimension the infrastructure in a reasonable time.

Original languageEnglish
Article numbere6719
JournalConcurrency and Computation: Practice and Experience
Volume34
Issue number14
DOIs
StatePublished - Jun 25 2022
Externally publishedYes

Funding

The authors would like to thank Rafael Ferreira da Silva, Henri Casanova, and all the WRENCH development team for their valuable help in the design of the proposed WRENCH-based simulator.

Keywords

  • IaaS cloud
  • data-intensive workflows
  • makespan reduction
  • workflow scheduling

Fingerprint

Dive into the research topics of 'Data-aware and simulation-driven planning of scientific workflows on IaaS clouds'. Together they form a unique fingerprint.

Cite this