TY - GEN
T1 - A fully empirical autotuned dense QR factorization for multicore architectures
AU - Agullo, Emmanuel
AU - Dongarra, Jack
AU - Nath, Rajib
AU - Tomov, Stanimire
PY - 2011
Y1 - 2011
N2 - Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach.We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
AB - Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach.We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. The tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. This work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
UR - http://www.scopus.com/inward/record.url?scp=80052325794&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-23397-5_19
DO - 10.1007/978-3-642-23397-5_19
M3 - Conference contribution
AN - SCOPUS:80052325794
SN - 9783642233968
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 194
EP - 205
BT - Euro-Par 2011 Parallel Processing - 17th International Conference, Proceedings
T2 - 17th International Conference on Parallel Processing, Euro-Par 2011
Y2 - 29 August 2011 through 2 September 2011
ER -