TY - GEN
T1 - PCCS
T2 - 54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021
AU - Xu, Yuanchao
AU - Belviranli, Mehmet E.
AU - Shen, Xipeng
AU - Vetter, Jeffrey
N1 - Publisher Copyright:
© 2021 Association for Computing Machinery.
PY - 2021/10/18
Y1 - 2021/10/18
N2 - Many slowdown models have been proposed to characterize memory interference ofworkloads co-running on heterogeneous Systemon- Chips (SoCs). But they are mostly for post-silicon usage. How to effectively consider memory interference in the SoC design stage remains an open problem. This paper presents a new approach to this problem, consisting of a novel processor-centric slowdown modeling methodology and a new three-region interference-conscious slowdown model. The modeling process needs no measurement of corunning of various combinations of applications, but the produced slowdown models can be used to estimate the co-run slowdowns of arbitrary workloads on various SoC designs that embed a newer generation of accelerators, such as deep learning accelerators (DLA), in addition to CPUs and GPUs. The new method reduces average prediction errors of the state-of-art model from 30.3% to 8.7% on GPU, from 13.4% to 3.7% on CPU, from 20.6% to 5.6% on DLA and demonstrates much improved efficacy in guiding SoC designs.
AB - Many slowdown models have been proposed to characterize memory interference ofworkloads co-running on heterogeneous Systemon- Chips (SoCs). But they are mostly for post-silicon usage. How to effectively consider memory interference in the SoC design stage remains an open problem. This paper presents a new approach to this problem, consisting of a novel processor-centric slowdown modeling methodology and a new three-region interference-conscious slowdown model. The modeling process needs no measurement of corunning of various combinations of applications, but the produced slowdown models can be used to estimate the co-run slowdowns of arbitrary workloads on various SoC designs that embed a newer generation of accelerators, such as deep learning accelerators (DLA), in addition to CPUs and GPUs. The new method reduces average prediction errors of the state-of-art model from 30.3% to 8.7% on GPU, from 13.4% to 3.7% on CPU, from 20.6% to 5.6% on DLA and demonstrates much improved efficacy in guiding SoC designs.
KW - Accelerator Architectures
KW - Performance Models
KW - System-on-Chips
UR - http://www.scopus.com/inward/record.url?scp=85118833844&partnerID=8YFLogxK
U2 - 10.1145/3466752.3480101
DO - 10.1145/3466752.3480101
M3 - Conference contribution
AN - SCOPUS:85118833844
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 1282
EP - 1295
BT - MICRO 2021 - 54th Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings
PB - IEEE Computer Society
Y2 - 18 October 2021 through 22 October 2021
ER -