Machine-learning-based load balancing for community ice code component in CESM

Prasanna Balaprakash, Yuri Alexeev, Sheri A. Mickelson, Sven Leyffer, Robert Jacob, Anthony Craig

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

Load balancing scientific codes on massively parallel architectures is becoming an increasingly challenging task. In this paper, we focus on the Community Earth System Model, a widely used climate modeling code. It comprises six components each of which exhibits different scalability patterns. Previously, an analytical performance model has been used to find optimal load-balancing parameter configurations for each component. Nevertheless, for the Community Ice Code component, the analytical performance model is too restrictive to capture its scalability patterns. We therefore developed machine-learning-based load-balancing algorithm. It involves fitting a surrogate model to a small number of load-balancing configurations and their corresponding runtimes. This model is then used to find high-quality parameter configurations. Compared with the current practice of expert-knowledge-based enumeration over feasible configurations, the machine-learning-based load-balancing algorithm requires six times fewer evaluations to find the optimal configuration.

Original languageEnglish
Title of host publicationHigh Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers
EditorsOsni Marques, Michel Dayde, Kengo Nakajima
PublisherSpringer Verlag
Pages79-91
Number of pages13
ISBN (Print)9783319173528
DOIs
StatePublished - 2015
Externally publishedYes
Event11th International Conference on High Performance Computing for Computational Science, VECPAR 2014 - Eugene, United States
Duration: Jun 30 2014Jul 3 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8969
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on High Performance Computing for Computational Science, VECPAR 2014
Country/TerritoryUnited States
CityEugene
Period06/30/1407/3/14

Funding

This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. An award of computer time was provided by the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. The submitted manuscript has been created by the UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne) under Contracts No. DE-AC02-06CH11357 and DE-FG02-05ER25694 with the U.S. Department of Energy. The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The NCAR is sponsored by the National Science Foundation.

FundersFunder number
U.S. Department of Energy
National Science Foundation
U.S. Department of Energy
Office of Science
Advanced Scientific Computing ResearchDE-AC02-06CH11357
Argonne National Laboratory

    Fingerprint

    Dive into the research topics of 'Machine-learning-based load balancing for community ice code component in CESM'. Together they form a unique fingerprint.

    Cite this