Abstract
Orchestrating the execution of ensembles of processes lies at the core of scientific workflow engines on large scale parallel platforms. This is usually handled using platform-specific command line tools, with limited process management control and potential strain on system resources. The PMIx standard provides a uniform interface to system resources. The low level C implementation of PMIx has hampered its use in workflow engines, leading to the development of Python binding that has yet to gain traction. In this paper, we present our work to harden the PMIx Python client, demonstrating its usability using a prototype Python driver to orchestrate the execution of an ensemble of processes. We present experimental results using the prototype on the Summit supercomputer at Oak Ridge National Laboratory. This work lays the foundation for wider adoption of PMIx for workflow engines, and encourages wider support of more PMIx functionality in vendor provided system software stacks.
Original language | English |
---|---|
Title of host publication | Parallel and Distributed Computing, Applications and Technologies - 23rd International Conference, PDCAT 2022, Proceedings |
Editors | Hiroyuki Takizawa, Hong Shen, Toshihiro Hanawa, Jong Hyuk Park, Hui Tian, Ryusuke Egawa |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 523-534 |
Number of pages | 12 |
ISBN (Print) | 9783031299261 |
DOIs | |
State | Published - 2023 |
Event | 23rd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2022 - Sendai, Japan Duration: Dec 7 2022 → Dec 9 2022 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 13798 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 23rd International Conference on Parallel and Distributed Computing, Applications, and Technologies, PDCAT 2022 |
---|---|
Country/Territory | Japan |
City | Sendai |
Period | 12/7/22 → 12/9/22 |
Funding
Acknowledgements. The original Python bindings were developed by Ralph Castain and Danielle Sikich with support from Intel and Argonne National Laboratory. We would like to thank Ralph Castain for his continued efforts and hard work spearheading the PMIx project. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was partially supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.