Varbench: an experimental framework to measure and characterize performance variability

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench’s capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how system software should be designed to mitigate variability.

Original languageEnglish
Title of host publicationProceedings of the 47th International Conference on Parallel Processing, ICPP 2018
PublisherAssociation for Computing Machinery
ISBN (Print)9781450365109
DOIs
StatePublished - Aug 13 2018
Externally publishedYes
Event47th International Conference on Parallel Processing, ICPP 2018 - Eugene, United States
Duration: Aug 13 2018Aug 16 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Conference47th International Conference on Parallel Processing, ICPP 2018
Country/TerritoryUnited States
CityEugene
Period08/13/1808/16/18

Fingerprint

Dive into the research topics of 'Varbench: an experimental framework to measure and characterize performance variability'. Together they form a unique fingerprint.

Cite this