Abstract
Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources: campus clusters, national cyberinfrastructures, and commercial and academic clouds. This paper describes the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructures. Pegasus has been used for more than twelve years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, physics and others. This paper provides an integrated view of the Pegasus system, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms. The paper describes how Pegasus achieves reliable, scalable workflow execution across a wide variety of computing infrastructures.
Original language | English |
---|---|
Pages (from-to) | 17-35 |
Number of pages | 19 |
Journal | Future Generation Computer Systems |
Volume | 46 |
DOIs | |
State | Published - May 2015 |
Externally published | Yes |
Funding
This research was done using resources provided by the Open Science Grid, which is supported by the National Science Foundation and the US Department of Energy’s Office of Science . The Cybershake workflows research was supported by the Southern California Earthquake Center . SCEC is funded by NSF Cooperative Agreement EAR-1033462 and USGS Cooperative Agreement G12AC20038 . The SCEC contribution number for this paper is 1911. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575 . Pegasus is funded by The National Science Foundation under the ACI SDCI program grant # 0722019 and ACI SI2-SSI program grant # 1148515 . Pegasus has been in development since 2001 and has benefited greatly from the expertise and efforts of people who worked on it over the years. We would like to especially thank Gaurang Mehta, Mei-Hui Su, Jens-S. Vöckler, Fabio Silva, Gurmeet Singh, Prasanth Thomas and Arun Ramakrishnan for their efforts and contributions to Pegasus. We would also like to extend our gratitude to all the members of our user community who have used Pegasus over the years and provided valuable feedback, especially Duncan Brown, Scott Koranda, Kent Blackburn, Yu Huang, Nirav Merchant, Jonathan Livny, and Bruce Berriman.
Funders | Funder number |
---|---|
US Department of Energy | |
National Science Foundation | EAR-1033462, ACI SI2-SSI, OCI-1053575 |
Directorate for Computer and Information Science and Engineering | 1148515, 0722019 |
U.S. Geological Survey | G12AC20038 |
Office of Science | |
Southern California Earthquake Center |
Keywords
- Pegasus
- Scientific workflows
- Workflow management system