Multi-GPU implementation of LU factorization

Yulu Jia, Piotr Luszczek, Jack Dongarra

Research output: Contribution to journalConference articlepeer-review

13 Scopus citations

Abstract

LU factorization is the most computationally intensive step in solving systems of linear equations. By obtaining first the LU factorization of the coefficient matrix, we then may readily solve the system using backward substitution. The computational cost of LU factorization in terms floating point operations is cubic. There are various efforts to improve the performance of LU factorization. We propose a multi-core multi-GPU hybrid LU factorization algorithm that leverages the strengths of both multiple CPUs and multiple GPUs. Our algorithm uses some of the CPU cores for panel factorization, and the rest of the CPU cores together with all the available GPUs for trailing submatrix updates. Our algorithm employs both dynamic scheduling and static scheduling. Experiments show that our approach reaches 1134 Gflop/s with 4 Fermi GPU boards when combined with the total of 48 CPU cores from AMD. This is the first time such level of performance have been reported in a shared memory environment. Execution trace shows that our code also achieves good load balance and high system utilization.

Original languageEnglish
Pages (from-to)106-115
Number of pages10
JournalProcedia Computer Science
Volume9
DOIs
StatePublished - 2012
Event12th Annual International Conference on Computational Science, ICCS 2012 - Omaha, NB, United States
Duration: Jun 4 2012Jun 6 2012

Funding

This work was supported by NSF through through grant 1038814. Email addresses: [email protected] (Yulu Jia), [email protected] (Piotr Luszczek), [email protected] (Jack Dongarra) 1Corresponding author

Keywords

  • Hardware accelerators
  • Hybrid
  • LU factorization
  • Multi-core multi-GPU

Fingerprint

Dive into the research topics of 'Multi-GPU implementation of LU factorization'. Together they form a unique fingerprint.

Cite this