Towards a Standard Process Management Infrastructure for Workflows Using Python

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Orchestrating the execution of ensembles of processes lies at the core of scientific workflow engines on large scale parallel platforms. This is usually handled using platform-specific command line tools, with limited process management control and potential strain on system resources. The PMIx standard provides a uniform interface to system resources. The low level C implementation of PMIx has hampered its use in workflow engines, leading to the development of Python binding that has yet to gain traction. In this paper, we present our work to harden the PMIx Python client, demonstrating its usability using a prototype Python driver to orchestrate the execution of an ensemble of processes. We present experimental results using the prototype on the Summit supercomputer at Oak Ridge National Laboratory. This work lays the foundation for wider adoption of PMIx for workflow engines, and encourages wider support of more PMIx functionality in vendor provided system software stacks.

Original languageEnglish
Title of host publicationParallel and Distributed Computing, Applications and Technologies - 23rd International Conference, PDCAT 2022, Proceedings
EditorsHiroyuki Takizawa, Hong Shen, Toshihiro Hanawa, Jong Hyuk Park, Hui Tian, Ryusuke Egawa
PublisherSpringer Science and Business Media Deutschland GmbH
Pages523-534
Number of pages12
ISBN (Print)9783031299261
DOIs
StatePublished - 2023
Event23rd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2022 - Sendai, Japan
Duration: Dec 7 2022Dec 9 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13798 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2022
Country/TerritoryJapan
CitySendai
Period12/7/2212/9/22

Funding

Acknowledgements. The original Python bindings were developed by Ralph Castain and Danielle Sikich with support from Intel and Argonne National Laboratory. We would like to thank Ralph Castain for his continued efforts and hard work spearheading the PMIx project. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was partially supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

Fingerprint

Dive into the research topics of 'Towards a Standard Process Management Infrastructure for Workflows Using Python'. Together they form a unique fingerprint.

Cite this