TY - GEN
T1 - An efficient framework for multi-dimensional tuning of high performance computing applications
AU - Cong, Guojing
AU - Wen, Huifang
AU - Chung, I. Hsin
AU - Klepacki, David
AU - Murata, Hiroki
AU - Negishi, Yasushi
PY - 2012
Y1 - 2012
N2 - Deploying an application onto a target platform for high performance oftentimes demands manual tuning by experts. As machine architecture gets increasingly complex, tuning becomes even more challenging and calls for systematic approaches. In our earlier work we presented a prototype that combines efficiently expert knowledge, static analysis, and runtime observation for bottleneck detection, and employs refactoring and compiler feedback for mitigation. In this study, we develop a software tool that facilitates \emph{fast} searching of bottlenecks and effective mitigation of problems from major dimensions of computing (e.g., computation, communication, and I/O). The impact of our approach is demonstrated by the tuning of the LBMHD code and a Poisson solver code, representing traditional scientific codes, and a graph analysis code in UPC, representing emerging programming paradigms. In the experiments, our framework detects with a single run of the application intricate bottlenecks of memory access, I/O, and communication. Moreover, the automated solution implementation yields significant overall performance improvement on the target platforms. The improvement for LBMHD is up to 45\%, and the speedup for the UPC code is up to 5. These results suggest that our approach is a concrete step towards systematic tuning of high performance computing applications.
AB - Deploying an application onto a target platform for high performance oftentimes demands manual tuning by experts. As machine architecture gets increasingly complex, tuning becomes even more challenging and calls for systematic approaches. In our earlier work we presented a prototype that combines efficiently expert knowledge, static analysis, and runtime observation for bottleneck detection, and employs refactoring and compiler feedback for mitigation. In this study, we develop a software tool that facilitates \emph{fast} searching of bottlenecks and effective mitigation of problems from major dimensions of computing (e.g., computation, communication, and I/O). The impact of our approach is demonstrated by the tuning of the LBMHD code and a Poisson solver code, representing traditional scientific codes, and a graph analysis code in UPC, representing emerging programming paradigms. In the experiments, our framework detects with a single run of the application intricate bottlenecks of memory access, I/O, and communication. Moreover, the automated solution implementation yields significant overall performance improvement on the target platforms. The improvement for LBMHD is up to 45\%, and the speedup for the UPC code is up to 5. These results suggest that our approach is a concrete step towards systematic tuning of high performance computing applications.
UR - http://www.scopus.com/inward/record.url?scp=84866866019&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2012.124
DO - 10.1109/IPDPS.2012.124
M3 - Conference contribution
AN - SCOPUS:84866866019
SN - 9780769546759
T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
SP - 1376
EP - 1387
BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Y2 - 21 May 2012 through 25 May 2012
ER -