Abstract
This chapter presents the current best design and implementation practices for the acceleration of dense linear algebra (DLA) on GPUs. Examples are given with fundamental algorithms-from the matrix-matrix multiplication kernel written in CUDA to the higher level algorithms for solving linear systems, eigenvalue and SVD problems. The implementations are available through the MAGMA library-a redesign for GPUs of the popular LAPACK. To generate the extreme level of parallelism needed for the efficient use of GPUs, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multicore CPUs with GPU accelerators using either static scheduling or a light-weight runtime system. The use of light-weight runtime systems keeps scheduling overhead low, similar to static scheduling, while enabling the expression of parallelism through sequential-like code. This simplifies the development effort and allows the exploration of the unique strengths of the various hardware components.
Original language | English |
---|---|
Title of host publication | Numerical Computations with GPUs |
Publisher | Springer International Publishing |
Pages | 3-28 |
Number of pages | 26 |
ISBN (Electronic) | 9783319065489 |
ISBN (Print) | 9783319065472 |
DOIs | |
State | Published - Jan 1 2014 |