Reproducibility of execution environments in computational science using Semantics and Clouds

Idafen Santana-Perez, Rafael Ferreira da Silva, Mats Rynge, Ewa Deelman, María S. Pérez-Hernández, Oscar Corcho

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

In the past decades, one of the most common forms of addressing reproducibility in scientific workflow-based computational science has consisted of tracking the provenance of the produced and published results. Such provenance allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution. Nevertheless, this approach does not provide any means for capturing and sharing the very valuable knowledge about the experimental equipment of a computational experiment, i.e., the execution environment in which the experiments are conducted. In this work, we propose a novel approach based on semantic vocabularies that describes the execution environment of scientific workflows, so as to conserve it. We define a process for documenting the workflow application and its related management system, as well as their dependencies. Then we apply this approach over three different real workflow applications running in three distinct scenarios, using public, private, and local Cloud platforms. In particular, we study one astronomy workflow and two life science workflows for genomic information analysis. Experimental results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on all evaluated computing platforms.

Original languageEnglish
Pages (from-to)354-367
Number of pages14
JournalFuture Generation Computer Systems
Volume67
DOIs
StatePublished - Feb 1 2017
Externally publishedYes

Funding

This material is based upon work supported in part by the National Science Foundation under Grant No. 0910812 to Indiana University for “FutureGrid: An Experimental, High-Performance Grid Test-bed” and the FPU grant from the Spanish Science and Innovation Ministry (MICINN), and the Ministerio de Economía y Competitividad (Spain) project ”4V: Volumen, Velocidad, Variedad y Validez en la Gestión Innovadora de Datos” (TIN2013-46238-C4-2-R). This research was also supported by the National Science Foundation under the SI–SSI program, award number 1148515. We also thank Gideon Juve and Karan Vahi for their valuable help.

Keywords

  • Life sciences
  • Reproducibility
  • Scientific workflow
  • Semantic metadata

Fingerprint

Dive into the research topics of 'Reproducibility of execution environments in computational science using Semantics and Clouds'. Together they form a unique fingerprint.

Cite this