Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication

Azzam Haidar, Mark Gates, Stan Tomov, Jack Dongarra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe a successful methodology on how to address the challenges - starting from our algorithm design, kernel optimization and tuning, to our programming model - in the development of a scalable high-performance tridiagonal reduction algorithm for the symmetric eigenvalue problem. This is a fundamental linear algebra problem with many engineering and physics applications. We use a combination of a task-based approach to parallelism and a new algorithmic design to achieve high performance. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. This may increase the number of flops, but the increase is offset by the more efficient execution and reduced data transfers. Our performance results are the best available, providing an enormous performance boost compared to current state-of-the-art solutions. In particular, our software scales up to 1070 Gflop/s using 16 Intel E5-2670 cores and eight M2090 GPUs, compared to 45 Gflop/s achieved by the optimized Intel Math Kernel Library (MKL) using only the 16 CPU cores.

Original languageEnglish
Title of host publicationICS 2013 - Proceedings of the 2013 ACM International Conference on Supercomputing
Pages223-232
Number of pages10
DOIs
StatePublished - 2013
Event27th ACM International Conference on Supercomputing, ICS 2013 - Eugene, OR, United States
Duration: Jun 10 2013Jun 14 2013

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference27th ACM International Conference on Supercomputing, ICS 2013
Country/TerritoryUnited States
CityEugene, OR
Period06/10/1306/14/13

Keywords

  • eigenvalue
  • gpu communication
  • gpu computation
  • heterogeneous programming model
  • performance
  • reduction to tridiagonal
  • singular value decomposiiton
  • task parallelism

Fingerprint

Dive into the research topics of 'Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication'. Together they form a unique fingerprint.

Cite this