Abstract
Designing a balanced HPC system requires an understanding of the dominant performance bottlenecks. There is as yet no well established methodology for a unified evaluation of HPC systems and workloads that quantifies the main performance bottlenecks. In this paper, we execute seven production HPC applications on a production HPC platform, and analyse the key performance bottlenecks: FLOPS performance and memory bandwidth congestion, and the implications on scaling out. We show that the results depend significantly on the number of execution processes and granularity of measurements. We therefore advocate for guidance in the application suites, on selecting the representative scale of the experiments. Also, we propose that the FLOPS performance and memory bandwidth should be represented in terms of the proportions of time with low, moderate and severe utilization. We show that this gives much more precise and actionable evidence than the average.
Original language | English |
---|---|
Title of host publication | Euro-Par 2018 |
Subtitle of host publication | Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Proceedings |
Editors | Massimo Torquati, Marco Aldinucci, Luca Padovani |
Publisher | Springer Verlag |
Pages | 135-146 |
Number of pages | 12 |
ISBN (Print) | 9783319969824 |
DOIs | |
State | Published - 2018 |
Externally published | Yes |
Event | 24th International European Conference on Parallel and Distributed Computing, Euro-Par 2018 - Turin, Italy Duration: Aug 27 2018 → Aug 31 2018 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11014 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 24th International European Conference on Parallel and Distributed Computing, Euro-Par 2018 |
---|---|
Country/Territory | Italy |
City | Turin |
Period | 08/27/18 → 08/31/18 |
Funding
This work was supported by the Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government; and the European Union’s Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). Acknowledgements. This work was supported by the Spanish Ministry of Science and Technology (project TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), Severo Ochoa Programme (SEV-2015-0493) of the Spanish Government; and the European Union’s Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578).
Keywords
- Bottlenecks
- FLOPS
- HPC applications
- Memory bandwidth
- Scaling-out