LU factorization of small matrices: Accelerating batched DGETRF on the GPU

Tingxing Dong, Azzam Haidar, Piotr Luszczek, James Austin Harris, Stanimire Tomov, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

41 Scopus citations

Abstract

Gaussian Elimination is commonly used to solve dense linear systems in scientific models. In a large number of applications, a need arises to solve many small size problems, instead of few large linear systems. The size of each of these small linear systems depends on the number of the ordinary differential equations (ODEs) used in the model, and can be on the order of hundreds of unknowns. To efficiently exploit the computing power of modern accelerator hardware, these linear systems are processed in batches. To improve the numerical stability, at least partial pivoting is required, most often accomplished with row pivoting. However, row pivoting can result in a severe performance penalty on GPUs because it brings in thread divergence and non-coalesced memory accesses. In this paper, we propose a batched LU factorization for GPUs by using amulti-level blocked right looking algorithm that preserves the data layout but minimizes the penalty of partial pivoting. Our batched LU achieves up to 2.5-fold speedup when compared to the alternative CUBLAS solution on a K40c GPU.

Original languageEnglish
Title of host publicationProceedings - 16th IEEE International Conference on High Performance Computing and Communications, HPCC 2014, 11th IEEE International Conference on Embedded Software and Systems, ICESS 2014 and 6th International Symposium on Cyberspace Safety and Security, CSS 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages157-160
Number of pages4
ISBN (Electronic)9781479961238
DOIs
StatePublished - Mar 9 2014
Event16th IEEE International Conference on High Performance Computing and Communications, HPCC 2014, 11th IEEE International Conference on Embedded Software and Systems, ICESS 2014 and 6th International Symposium on Cyberspace Safety and Security, CSS 2014 - Paris, France
Duration: Aug 20 2014Aug 22 2014

Publication series

NameProceedings - 16th IEEE International Conference on High Performance Computing and Communications, HPCC 2014, 11th IEEE International Conference on Embedded Software and Systems, ICESS 2014 and 6th International Symposium on Cyberspace Safety and Security, CSS 2014

Conference

Conference16th IEEE International Conference on High Performance Computing and Communications, HPCC 2014, 11th IEEE International Conference on Embedded Software and Systems, ICESS 2014 and 6th International Symposium on Cyberspace Safety and Security, CSS 2014
Country/TerritoryFrance
CityParis
Period08/20/1408/22/14

Keywords

  • GPU
  • Gaussian Elimination
  • batched

Fingerprint

Dive into the research topics of 'LU factorization of small matrices: Accelerating batched DGETRF on the GPU'. Together they form a unique fingerprint.

Cite this