Pegasus, a workflow management system for science automation

Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Scott Callaghan, Philip J. Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira Da Silva, Miron Livny, Kent Wenger

Research output: Contribution to journalArticlepeer-review

648 Scopus citations

Abstract

Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources: campus clusters, national cyberinfrastructures, and commercial and academic clouds. This paper describes the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructures. Pegasus has been used for more than twelve years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, physics and others. This paper provides an integrated view of the Pegasus system, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms. The paper describes how Pegasus achieves reliable, scalable workflow execution across a wide variety of computing infrastructures.

Original languageEnglish
Pages (from-to)17-35
Number of pages19
JournalFuture Generation Computer Systems
Volume46
DOIs
StatePublished - May 2015
Externally publishedYes

Funding

This research was done using resources provided by the Open Science Grid, which is supported by the National Science Foundation and the US Department of Energy’s Office of Science . The Cybershake workflows research was supported by the Southern California Earthquake Center . SCEC is funded by NSF Cooperative Agreement EAR-1033462 and USGS Cooperative Agreement G12AC20038 . The SCEC contribution number for this paper is 1911. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575 . Pegasus is funded by The National Science Foundation under the ACI SDCI program grant # 0722019 and ACI SI2-SSI program grant # 1148515 . Pegasus has been in development since 2001 and has benefited greatly from the expertise and efforts of people who worked on it over the years. We would like to especially thank Gaurang Mehta, Mei-Hui Su, Jens-S. Vöckler, Fabio Silva, Gurmeet Singh, Prasanth Thomas and Arun Ramakrishnan for their efforts and contributions to Pegasus. We would also like to extend our gratitude to all the members of our user community who have used Pegasus over the years and provided valuable feedback, especially Duncan Brown, Scott Koranda, Kent Blackburn, Yu Huang, Nirav Merchant, Jonathan Livny, and Bruce Berriman.

FundersFunder number
US Department of Energy
National Science FoundationEAR-1033462, ACI SI2-SSI, OCI-1053575
Directorate for Computer and Information Science and Engineering1148515, 0722019
U.S. Geological SurveyG12AC20038
Office of Science
Southern California Earthquake Center

    Keywords

    • Pegasus
    • Scientific workflows
    • Workflow management system

    Fingerprint

    Dive into the research topics of 'Pegasus, a workflow management system for science automation'. Together they form a unique fingerprint.

    Cite this