TY - GEN
T1 - Diagnosis and optimization of application prefetching performance
AU - Marin, Gabriel
AU - McCurdy, Collin
AU - Vetter, Jeffrey S.
PY - 2013
Y1 - 2013
N2 - Hardware prefetchers are effective at recognizing streaming memory access patterns and at moving data closer to the processing units to hide memory latency. However, hardware prefetchers can track only a limited number of data streams due to finite hardware resources. In this paper, we introduce the term streaming concurrency to characterize the number of parallel, logical data streams in an application. We present a simulation algorithm for understanding the streaming concurrency at any point in an application, and we show that this metric is a good predictor of the number of memory requests initiated by streaming prefetchers. Next, we try to understand the causes behind poor prefetching performance. We identified four prefetch unfriendly conditions and we show how to classify an application's memory references based on these conditions. We evaluated our analysis using the SPEC CPU2006 benchmark suite. We selected two benchmarks with unfavorable access patterns and transformed them to improve their prefetching effectiveness. Results show that making applications more prefetcher friendly can yield meaningful performance gains.
AB - Hardware prefetchers are effective at recognizing streaming memory access patterns and at moving data closer to the processing units to hide memory latency. However, hardware prefetchers can track only a limited number of data streams due to finite hardware resources. In this paper, we introduce the term streaming concurrency to characterize the number of parallel, logical data streams in an application. We present a simulation algorithm for understanding the streaming concurrency at any point in an application, and we show that this metric is a good predictor of the number of memory requests initiated by streaming prefetchers. Next, we try to understand the causes behind poor prefetching performance. We identified four prefetch unfriendly conditions and we show how to classify an application's memory references based on these conditions. We evaluated our analysis using the SPEC CPU2006 benchmark suite. We selected two benchmarks with unfavorable access patterns and transformed them to improve their prefetching effectiveness. Results show that making applications more prefetcher friendly can yield meaningful performance gains.
KW - diagnosis
KW - performance modeling
KW - stream prefetching
UR - http://www.scopus.com/inward/record.url?scp=84879803947&partnerID=8YFLogxK
U2 - 10.1145/2464996.2465014
DO - 10.1145/2464996.2465014
M3 - Conference contribution
AN - SCOPUS:84879803947
SN - 9781450321303
T3 - Proceedings of the International Conference on Supercomputing
SP - 303
EP - 312
BT - ICS 2013 - Proceedings of the 2013 ACM International Conference on Supercomputing
T2 - 27th ACM International Conference on Supercomputing, ICS 2013
Y2 - 10 June 2013 through 14 June 2013
ER -