Job Management and Task Bundling

Evan Berkowitz, Gustav R. Jansen, Kenneth McElvain, André Walker-Loud

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations

Abstract

High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems. Measurements in Lattice QCD frequently do not scale to machine-size workloads. By bundling tasks together we can create large jobs suitable for gigantic partitions. We discuss METAQ and mpi-jm, software developed to dynamically group computational tasks together, that can intelligently backfill to consume idle time without substantial changes to users' current workflows or executables.

Original languageEnglish
Article number09007
JournalEPJ Web of Conferences
Volume175
DOIs
StatePublished - Mar 26 2018
Event35th International Symposium on Lattice Field Theory, Lattice 2017 - Granada, Spain
Duration: Jun 18 2017Jun 24 2017

Funding

METAQ was tested on aztec, ca , surface, and vulcan at LLNL through the Multiprogrammatic and Institutional Computing and Grand Challenge programs. METAQ was also tested and used in production on titan at Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725, and edison and cori at NERSC, the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This work was supported in part by the Office of Science, Department of Energy, Office of Advanced Scientific Computing Research through the CalLat SciDAC3 grant under Award Number KB0301052. This work was supported in part by the DFG and the NSFC Sino-German CRC110. This research used resources of the Oak Ridge Leadership Computing Facility located at ORNL, which is supported by the Office of Science of the Department of Energy under Contract No. DE-AC05-00OR22725.

Fingerprint

Dive into the research topics of 'Job Management and Task Bundling'. Together they form a unique fingerprint.

Cite this