Benchmarking machine learning methods for performance modeling of scientific applications

Preeti Malakar, Prasanna Balaprakash, Venkatram Vishwanath, Vitali Morozov, Kalyan Kumaran

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

54 Scopus citations

Abstract

Performance modeling is an important and active area of research in high-performance computing (HPC). It helps in better job scheduling and also improves overall performance of coupled applications. Sufficiently rich analytical models are challenging to develop, however, because of interactions between different node components, network topologies, job interference, and application complexity. When analytical performance models become restrictive because of application dynamics and/or multicomponent interactions, machine-learning-based performance models can be helpful. While machine learning (ML) methods do not require underlying system or application knowledge, they are efficient in learning the unknown interactions of the application and system parameters empirically using application runs. We present a benchmark study in which we evaluate eleven machine learning methods for modeling the performance of four representative scientific applications that are irregular and with skewed domain configurations on four leadership-class HPC platforms. We assess the impact of feature engineering, size of training set, modern hardware platforms, transfer learning, extrapolation on the prediction accuracy, and training and inference times. We find that bagging, boosting, and deep neural network ML methods are promising approaches with median R2 values greater than 0.95 and these methods do not require feature engineering. We demonstrate that cross-platform performance prediction can be improved significantly using transfer learning with deep neural networks.

Original languageEnglish
Title of host publicationProceedings of PMBS 2018
Subtitle of host publicationPerformance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages33-44
Number of pages12
ISBN (Electronic)9781728101828
DOIs
StatePublished - Jul 2 2018
Externally publishedYes
Event2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2018 - Dallas, United States
Duration: Nov 12 2018 → …

Publication series

NameProceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2018
Country/TerritoryUnited States
CityDallas
Period11/12/18 → …

Funding

This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research program under contract number DE-AC02-06CH11357. We gratefully acknowledge the computing resources at the Argonne Leadership Computing Facility and National Energy Research Scientific Computing Center.

Keywords

  • benchmarking
  • machine learning
  • performance modeling
  • transfer learning

Fingerprint

Dive into the research topics of 'Benchmarking machine learning methods for performance modeling of scientific applications'. Together they form a unique fingerprint.

Cite this