Workflow Submit Nodes as a Service on Leadership Class Systems

George Papadimitriou, Karan Vahi, Jason Kincl, Valentine Anantharaj, Ewa Deelman, Jack Wells

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

DOE scientists, today, have access to high performance computing (HPC) facilities with very powerful systems that enable them to execute their computations faster, more efficiently, and at greater scales than ever before. To further their knowledge and produce new discoveries, scientists rely on workflows - sometimes very complex - that provide them with an easy way to automate, reproduce and verify their computations. However, historically, creating workflow submission environments in large HPC facilities has been cumbersome, requires expertise and many man-hours of effort due to the peculiarities, policies, and the restrictions that these systems present. In this paper we discuss the approach a large DOE facility (OLCF) is taking in order to provide containers as a service to its users. This capability is used to create Pegasus workflow management system submit nodes as a service (WSaaS) at the Oak Ridge Leadership Computing Facilities (OLCF), targeting the Summit supercomputer. This deployment builds upon the Kubernetes/Openshift cluster (Slate) that exists within OLCF's DMZ and its automation triggers. Additionally, we evaluate our approach's overhead and effort to deploy the solution as compared to previous solutions, such as setting up a Pegasus submission environment on OLCF's login nodes or submitting jobs remotely via the rvGAHP.

Original languageEnglish
Title of host publicationPEARC 2020 - Practice and Experience in Advanced Research Computing 2020
Subtitle of host publicationCatch the Wave
PublisherAssociation for Computing Machinery
Pages56-63
Number of pages8
ISBN (Electronic)9781450366892
DOIs
StatePublished - Jul 26 2020
Event2020 Conference on Practice and Experience in Advanced Research Computing: Catch the Wave, PEARC 2020 - Virtual, Online, United States
Duration: Jul 27 2020Jul 31 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2020 Conference on Practice and Experience in Advanced Research Computing: Catch the Wave, PEARC 2020
Country/TerritoryUnited States
CityVirtual, Online
Period07/27/2007/31/20

Funding

This work was funded by DOE contract number #DESC0012636, “Panorama—Predictive Modeling and Diagnostic Monitoring of Extreme Science Workflows”, and by the U.S. Department of Energy, Office of Science under contract DE-AC02-06CH11357. Also, this research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Finally, we would like to thank Scott Callaghan from SCEC for his help in debugging the rvGAHP deployment on Summit’s login nodes and OLCF’s data transfer nodes (DTNs). This work was funded by DOE contract number #DESC0012636, ?Panorama?Predictive Modeling and Diagnostic Monitoring of Extreme Science Workflows?, and by the U.S. Department of Energy, Office of Science under contract DE-AC02-06CH11357. Also, this research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. Finally, we would like to thank Scott Callaghan from SCEC for his help in debugging the rvGAHP deployment on Summit?s login nodes and OLCF?s data transfer nodes (DTNs).

Keywords

  • Kubernetes
  • Pegasus
  • Scientific Workflows
  • Summit Supercomputer

Fingerprint

Dive into the research topics of 'Workflow Submit Nodes as a Service on Leadership Class Systems'. Together they form a unique fingerprint.

Cite this