TY - GEN
T1 - Leading edge hybrid multi-GPU algorithms for generalized eigenproblems in electronic structure calculations
AU - Haidar, Azzam
AU - Solcà, Raffaele
AU - Gates, Mark
AU - Tomov, Stanimire
AU - Schulthess, Thomas
AU - Dongarra, Jack
PY - 2013
Y1 - 2013
N2 - Today's high computational demands from engineering fields and complex hardware development make it necessary to develop and optimize new algorithms toward achieving high performance and good scalability on the next generation of computers. The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe and analyze a successful methodology to address the challenges-starting from our algorithm design, kernel optimization and tuning, to our programming model-in the development of a scalable high-performance generalized eigenvalue solver in the context of electronic structure calculations in materials science applications. We developed a set of leading edge dense linear algebra algorithms, as part of a generalized eigensolver, featuring fine grained memory aware kernels, a task based approach and hybrid execution/scheduling. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. We report the performance impact on the generalized eigensolver when different fractions of eigenvectors are needed. The algorithm described provides an enormous performance boost compared to current GPU-based solutions, and performance comparable to state-of-the-art distributed solutions, using a single node with multiple GPUs.
AB - Today's high computational demands from engineering fields and complex hardware development make it necessary to develop and optimize new algorithms toward achieving high performance and good scalability on the next generation of computers. The enormous gap between the high-performance capabilities of GPUs and the slow interconnect between them has made the development of numerical software that is scalable across multiple GPUs extremely challenging. We describe and analyze a successful methodology to address the challenges-starting from our algorithm design, kernel optimization and tuning, to our programming model-in the development of a scalable high-performance generalized eigenvalue solver in the context of electronic structure calculations in materials science applications. We developed a set of leading edge dense linear algebra algorithms, as part of a generalized eigensolver, featuring fine grained memory aware kernels, a task based approach and hybrid execution/scheduling. The goal of the new design is to increase the computational intensity of the major compute kernels and to reduce synchronization and data transfers between GPUs. We report the performance impact on the generalized eigensolver when different fractions of eigenvectors are needed. The algorithm described provides an enormous performance boost compared to current GPU-based solutions, and performance comparable to state-of-the-art distributed solutions, using a single node with multiple GPUs.
UR - http://www.scopus.com/inward/record.url?scp=84884488455&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-38750-0_6
DO - 10.1007/978-3-642-38750-0_6
M3 - Conference contribution
AN - SCOPUS:84884488455
SN - 9783642387494
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 67
EP - 80
BT - Supercomputing - 28th International Supercomputing Conference, ISC 2013, Proceedings
T2 - 28th International Supercomputing Conference on Supercomputing, ISC 2013
Y2 - 16 June 2013 through 20 June 2013
ER -