Reusability First: Toward FAIR Workflows

Matthew Wolf, Jeremy Logan, Kshitij Mehta, Daniel Jacobson, Mikaela Cashman, Angelica M. Walker, Greg Eisenhauer, Patrick Widener, Ashley Cliff

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

The FAIR principles of open science (Findable, Accessible, Interoperable, and Reusable) have had transformative effects on modern large-scale computational science. In particular, they have encouraged more open access to and use of data, an important consideration as collaboration among teams of researchers accelerates and the use of workflows by those teams to solve problems increases. How best to apply the FAIR principles to workflows themselves, and software more generally, is not yet well understood. We argue that the software engineering concept of technical debt management provides a useful guide for application of those principles to workflows, and in particular that it implies reusability should be considered as 'first among equals'. Moreover, our approach recognizes a continuum of reusability where we can make explicit and selectable the tradeoffs required in workflows for both their users and developers. To this end, we propose a new abstraction approach for reusable workflows, with demonstrations for both synthetic workloads and real-world computational biology workflows. Through application of novel systems and tools that are based on this abstraction, these experimental workflows are refactored to rightsize the granularity of workflow components to efficiently fill the gap between end-user simplicity and general customizability. Our work makes it easier to selectively reason about and automate the connections between trade-offs across user and developer concerns when exposing degrees of freedom for reuse. Additionally, by exposing fine-grained reusability abstractions we enable performance optimizations, as we demonstrate on both institutional-scale and leadership-class HPC resources.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages444-455
Number of pages12
ISBN (Electronic)9781728196664
DOIs
StatePublished - 2021
Event2021 IEEE International Conference on Cluster Computing, Cluster 2021 - Virtual, Portland, United States
Duration: Sep 7 2021Sep 10 2021

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2021-September
ISSN (Print)1552-5244

Conference

Conference2021 IEEE International Conference on Cluster Computing, Cluster 2021
Country/TerritoryUnited States
CityVirtual, Portland
Period09/7/2109/10/21

Funding

This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. SAND2021-9168C. Our work makes it easier to selectively reason about and automate the connections between trade-offs across user and developer concerns when exposing degrees of freedom for reuse. Looking toward future development, we see great potential for more powerful and granular metadata representation, automation of reusable workflow composition, and applications across diverse areas of computational science (including climate, materials research, computational systems biology, and hybrid experimental/simulation platforms). Workflows represent the connections between data, computation, and human decision-making, and making them more reusable and automatable will have benefits across the science ecosystem. ACKNOWLEDGMENT This manuscript has been authored by UT-Battelle, LLC under contract no. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, last accessed September 16, 2020). Funding was provided by the Plant-Microbe Interfaces (PMI) SFA, the Exascale & Petascale Networks for KBase project and by The Center for Bioenergy Innovation (CBI). These are all supported by the Genomic Sciences Program of Office of Biological and Environmental Research in the DOE Office of Science. This work was also supported in part by the joint U.S. Department of Veterans Affairs, US Department of Energy MVP CHAMPION program, and the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

FundersFunder number
Plant-Microbe Interfaces
U.S. Department of Energy17-SC-20-SC
U.S. Department of Veterans Affairs
Office of ScienceDE-AC05-00OR22725
National Nuclear Security AdministrationDE-NA0003525, SAND2021-9168C
Sandia National Laboratories
Center for Bioenergy Innovation

    Keywords

    • Distributed Information systems
    • FAIR
    • Middleware
    • Reusability
    • Workflows

    Fingerprint

    Dive into the research topics of 'Reusability First: Toward FAIR Workflows'. Together they form a unique fingerprint.

    Cite this