Abstract
With the raw computing power of graphics processing units (GPUs) being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by the LAPACK divide and conquer algorithm on such modern heterogeneous systems. We focus on how to make the best use of the individual strengths of the massively parallel manycore GPUs and multicore CPUs. The resulting algorithm overcomes performance bottlenecks faced by current implementations that are optimized for a homogeneous multicore. On a dual socket quad-core Intel Xeon 2.33 GHz with an NVIDIA GTX 280 GPU, we typically obtain up to about a tenfold improvement in performance for the complete dense problem. The techniques described here thus represent an example of how to develop numerical software to efficiently use heterogeneous architectures. As heterogeneity becomes more common in the architecture design, the significance of and need for this work are expected to grow.
Original language | English |
---|---|
Pages (from-to) | C70-C82 |
Journal | SIAM Journal on Scientific Computing |
Volume | 34 |
Issue number | 2 |
DOIs | |
State | Published - 2012 |
Externally published | Yes |
Keywords
- GPU
- Heterogeneous computing
- Hybrid architecture
- LAPACK
- Multicore
- Performance
- Symmetric eigenvalue problem