TY - GEN
T1 - Investigating the TLB behavior of high-end scientific applications on commodity microprocessors
AU - McCurdy, Collin
AU - Cox, Alan L.
AU - Vetter, Jeffrey
PY - 2008
Y1 - 2008
N2 - The floating point portion of the SPEC CPU suite and the HPC Challenge suite are widely recognized and utilized as benchmarks that represent scientific application behavior. In this work we show that while these benchmark suites may be representative of the cache behavior of production scientific applications, they do not accurately represent the TLB behavior of these applications. Furthermore, we demonstrate that the difference can have a significant impact on performance. In the first part of the paper we present results from implementation-independent trace-based simulations which demonstrate that benchmarks exhibit significantly different TLB behavior for a range of page sizes than a representative set of production applications. In the second part we validate these results on the AMD Opteron implementation of the x86 architecture, showing that false conclusions about choice of page size, drawn from benchmark performance, can result in performance degradations of up to nearly 50% for the production applications we investigated.
AB - The floating point portion of the SPEC CPU suite and the HPC Challenge suite are widely recognized and utilized as benchmarks that represent scientific application behavior. In this work we show that while these benchmark suites may be representative of the cache behavior of production scientific applications, they do not accurately represent the TLB behavior of these applications. Furthermore, we demonstrate that the difference can have a significant impact on performance. In the first part of the paper we present results from implementation-independent trace-based simulations which demonstrate that benchmarks exhibit significantly different TLB behavior for a range of page sizes than a representative set of production applications. In the second part we validate these results on the AMD Opteron implementation of the x86 architecture, showing that false conclusions about choice of page size, drawn from benchmark performance, can result in performance degradations of up to nearly 50% for the production applications we investigated.
UR - http://www.scopus.com/inward/record.url?scp=52249092401&partnerID=8YFLogxK
U2 - 10.1109/ISPASS.2008.4510742
DO - 10.1109/ISPASS.2008.4510742
M3 - Conference contribution
AN - SCOPUS:52249092401
SN - 9781424422326
T3 - ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and Software
SP - 95
EP - 104
BT - ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and Software
T2 - IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2008
Y2 - 20 April 2008 through 22 April 2008
ER -