TY - GEN
T1 - Varbench
T2 - 47th International Conference on Parallel Processing, ICPP 2018
AU - Kocoloski, Brian
AU - Lange, John
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/8/13
Y1 - 2018/8/13
N2 - Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench’s capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how system software should be designed to mitigate variability.
AB - Performance variability is a major problem for extreme scale parallel computing applications that rely on bulk synchronization and collective communication. While this problem is most prominent in the context of exascale systems, it is increasingly impacting other communities such as machine learning and graph analytics. In this paper, we present an experimental performance analysis framework called varbench that is designed to precisely measure the prevalence of performance variability in a system, as well as to support workload characterization with respect to how and when a workload generates variability. We demonstrate several of varbench’s capabilities as they pertain to exascale-class systems, including its utility for discovering architectural trends, for performing cross-architectural comparisons, and for understanding key statistical properties of performance distributions that have implications for how system software should be designed to mitigate variability.
UR - https://www.scopus.com/pages/publications/85054797817
U2 - 10.1145/3225058.3225125
DO - 10.1145/3225058.3225125
M3 - Conference contribution
AN - SCOPUS:85054797817
SN - 9781450365109
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018
PB - Association for Computing Machinery
Y2 - 13 August 2018 through 16 August 2018
ER -