SWARM: Reimagining scientific workflow management systems in a distributed world

Prasanna Balaprakash, Krishnan Raghavan, Franck Cappello, Ewa Deelman, Anirban Mandal, Hongwei Jin, Imtiaz Mahmud, Komal Thareja, Shixun Wu, Pawel Zuk, Mariam Kiran, Zizhong Chen, Sheng Di, Kesheng Wu

Research output: Contribution to journalArticlepeer-review

Abstract

Modern scientific workflows process massive amounts of data from diverse instruments and sensors, leveraging geographically distributed, heterogeneous compute and storage resources—from leadership-class systems to edge devices—connected by high-performance networks. The diversity of resources introduces challenges in harnessing their full potential, with resilience issues arising across applications, system software, networks, storage, and hardware. Today, workflow management systems (WMS) coordinate the execution of computation and data management tasks across target resources. However, WMS’s centralized nature makes them vulnerable to faults and scalability issues that may result in failures of entire computational campaigns. This paper introduces a novel agentic framework for workflow management, fully distributing and decentralizing the WMS functions and modeling them as swarm intelligence agents infused with advanced artificial intelligence solutions and traditional distributed computing algorithms that can make coordinated decisions in the presence of failures of the underlying cyberinfrastructure.

Original languageEnglish
Article number10943420251339317
JournalInternational Journal of High Performance Computing Applications
DOIs
StateAccepted/In press - 2025

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The SWARM project is supported by the Department of Energy Award #DE-SC0024387.

Keywords

  • agentic AI
  • consensus algorithms
  • large language models
  • network overlays
  • resilience
  • Swarm intelligence
  • workflow systems

Fingerprint

Dive into the research topics of 'SWARM: Reimagining scientific workflow management systems in a distributed world'. Together they form a unique fingerprint.

Cite this