TY - GEN
T1 - Reusability First
T2 - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
AU - Wolf, Matthew
AU - Logan, Jeremy
AU - Mehta, Kshitij
AU - Jacobson, Daniel
AU - Cashman, Mikaela
AU - Walker, Angelica M.
AU - Eisenhauer, Greg
AU - Widener, Patrick
AU - Cliff, Ashley
N1 - Publisher Copyright:
©2021 IEEE.
PY - 2021
Y1 - 2021
N2 - The FAIR principles of open science (Findable, Accessible, Interoperable, and Reusable) have had transformative effects on modern large-scale computational science. In particular, they have encouraged more open access to and use of data, an important consideration as collaboration among teams of researchers accelerates and the use of workflows by those teams to solve problems increases. How best to apply the FAIR principles to workflows themselves, and software more generally, is not yet well understood. We argue that the software engineering concept of technical debt management provides a useful guide for application of those principles to workflows, and in particular that it implies reusability should be considered as 'first among equals'. Moreover, our approach recognizes a continuum of reusability where we can make explicit and selectable the tradeoffs required in workflows for both their users and developers. To this end, we propose a new abstraction approach for reusable workflows, with demonstrations for both synthetic workloads and real-world computational biology workflows. Through application of novel systems and tools that are based on this abstraction, these experimental workflows are refactored to rightsize the granularity of workflow components to efficiently fill the gap between end-user simplicity and general customizability. Our work makes it easier to selectively reason about and automate the connections between trade-offs across user and developer concerns when exposing degrees of freedom for reuse. Additionally, by exposing fine-grained reusability abstractions we enable performance optimizations, as we demonstrate on both institutional-scale and leadership-class HPC resources.
AB - The FAIR principles of open science (Findable, Accessible, Interoperable, and Reusable) have had transformative effects on modern large-scale computational science. In particular, they have encouraged more open access to and use of data, an important consideration as collaboration among teams of researchers accelerates and the use of workflows by those teams to solve problems increases. How best to apply the FAIR principles to workflows themselves, and software more generally, is not yet well understood. We argue that the software engineering concept of technical debt management provides a useful guide for application of those principles to workflows, and in particular that it implies reusability should be considered as 'first among equals'. Moreover, our approach recognizes a continuum of reusability where we can make explicit and selectable the tradeoffs required in workflows for both their users and developers. To this end, we propose a new abstraction approach for reusable workflows, with demonstrations for both synthetic workloads and real-world computational biology workflows. Through application of novel systems and tools that are based on this abstraction, these experimental workflows are refactored to rightsize the granularity of workflow components to efficiently fill the gap between end-user simplicity and general customizability. Our work makes it easier to selectively reason about and automate the connections between trade-offs across user and developer concerns when exposing degrees of freedom for reuse. Additionally, by exposing fine-grained reusability abstractions we enable performance optimizations, as we demonstrate on both institutional-scale and leadership-class HPC resources.
KW - Distributed Information systems
KW - FAIR
KW - Middleware
KW - Reusability
KW - Workflows
UR - http://www.scopus.com/inward/record.url?scp=85125298378&partnerID=8YFLogxK
U2 - 10.1109/Cluster48925.2021.00053
DO - 10.1109/Cluster48925.2021.00053
M3 - Conference contribution
AN - SCOPUS:85125298378
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 444
EP - 455
BT - Proceedings - 2021 IEEE International Conference on Cluster Computing, Cluster 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 September 2021 through 10 September 2021
ER -