TY - GEN
T1 - Programming the LU factorization for a multicore system with accelerators
AU - Kurzak, Jakub
AU - Luszczek, Piotr
AU - Faverge, Mathieu
AU - Dongarra, Jack
PY - 2013
Y1 - 2013
N2 - LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
AB - LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
UR - http://www.scopus.com/inward/record.url?scp=84883285586&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-38718-0_6
DO - 10.1007/978-3-642-38718-0_6
M3 - Conference contribution
AN - SCOPUS:84883285586
SN - 9783642387173
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 28
EP - 35
BT - High Performance Computing for Computational Science, VECPAR 2012 - 10th International Conference, Revised Selected Papers
T2 - 10th International Conference on High Performance Computing for Computational Science, VECPAR 2012
Y2 - 17 July 2012 through 20 July 2012
ER -