TY - GEN
T1 - A class of hybrid LAPACK algorithms for multicore and GPU architectures
AU - Horton, Mitch
AU - Tomov, Stanimire
AU - Dongarra, Jack
PY - 2011
Y1 - 2011
N2 - Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dualcore processors. The large-scale enabling of hybrid graphics processing unit (GPU)-based multicore platforms for computational science by developing fundamental numerical libraries (in particular, libraries in the area of dense linear algebra) for them has been underway for some time. We present a class of algorithms based largely on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithms extend what is currently available in the Matrix Algebra for GPU and Multicore Architectures (MAGMA) Library for performing Cholesky, QR, and LU factorizations using a single core or socket and a single GPU. The extensions occur in two areas. First, panels factored on the CPU using LAPACK are, instead, done in parallel using a highly optimized dynamic asynchronous scheduled algorithm on some number of CPU cores. Second, the remaining CPU cores are used to update the rightmost panels of the matrix in parallel.
AB - Three out of the top four supercomputers in the November 2010 TOP500 list of the world's most powerful supercomputers use NVIDIA GPUs to accelerate computations. Ninety-five systems from the list are using processors with six or more cores. Three-hundred-sixty-five systems use quad-core processor-based systems. Thirty-seven systems are using dualcore processors. The large-scale enabling of hybrid graphics processing unit (GPU)-based multicore platforms for computational science by developing fundamental numerical libraries (in particular, libraries in the area of dense linear algebra) for them has been underway for some time. We present a class of algorithms based largely on software infrastructures that have already been developed for homogeneous multicores and hybrid GPU-based computing. The algorithms extend what is currently available in the Matrix Algebra for GPU and Multicore Architectures (MAGMA) Library for performing Cholesky, QR, and LU factorizations using a single core or socket and a single GPU. The extensions occur in two areas. First, panels factored on the CPU using LAPACK are, instead, done in parallel using a highly optimized dynamic asynchronous scheduled algorithm on some number of CPU cores. Second, the remaining CPU cores are used to update the rightmost panels of the matrix in parallel.
KW - Cholesky
KW - GPU
KW - LU
KW - Multicore
KW - QR
UR - http://www.scopus.com/inward/record.url?scp=80054974933&partnerID=8YFLogxK
U2 - 10.1109/SAAHPC.2011.18
DO - 10.1109/SAAHPC.2011.18
M3 - Conference contribution
AN - SCOPUS:80054974933
SN - 9780769544489
T3 - Proceedings - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
SP - 150
EP - 158
BT - Proceedings - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
T2 - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
Y2 - 19 July 2011 through 20 July 2011
ER -