TY - GEN
T1 - Fast, scalable and accurate finite-element based ab initio calculations using mixed precision computing
T2 - 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
AU - Das, Sambit
AU - Motamarri, Phani
AU - Gavini, Vikram
AU - Turcksin, Bruno
AU - Li, Ying Wai
AU - Leback, Brent
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/11/17
Y1 - 2019/11/17
N2 - Accurate large-scale first principles calculations based on density functional theory (DFT) in metallic systems are prohibitively expensive due to the asymptotic cubic scaling computational complexity with number of electrons. Using algorithmic advances in employing finite-element discretization for DFT (DFT-FE) in conjunction with efficient computational methodologies and mixed precision strategies, we delay the onset of this cubic scaling by significantly reducing the computational prefactor while increasing the arithmetic intensity and lowering the data movement costs. This has enabled fast, accurate and massively parallel DFT calculations on large-scale metallic systems on both many-core and heterogeneous architectures, with time-to-solution being an order of magnitude faster than state-of-the-art plane-wave DFT codes. We demonstrate an unprecedented sustained performance of 46 PFLOPS (27.8% peak FP64 performance) on a dislocation system in Magnesium containing 105,080 electrons using 3,800 GPU nodes of Summit supercomputer, which is the highest performance to-date among DFT codes.
AB - Accurate large-scale first principles calculations based on density functional theory (DFT) in metallic systems are prohibitively expensive due to the asymptotic cubic scaling computational complexity with number of electrons. Using algorithmic advances in employing finite-element discretization for DFT (DFT-FE) in conjunction with efficient computational methodologies and mixed precision strategies, we delay the onset of this cubic scaling by significantly reducing the computational prefactor while increasing the arithmetic intensity and lowering the data movement costs. This has enabled fast, accurate and massively parallel DFT calculations on large-scale metallic systems on both many-core and heterogeneous architectures, with time-to-solution being an order of magnitude faster than state-of-the-art plane-wave DFT codes. We demonstrate an unprecedented sustained performance of 46 PFLOPS (27.8% peak FP64 performance) on a dislocation system in Magnesium containing 105,080 electrons using 3,800 GPU nodes of Summit supercomputer, which is the highest performance to-date among DFT codes.
KW - Density functional theory
KW - Finite-elements
KW - Heterogeneous architectures
KW - Light-weight alloys
KW - Mixed precision
KW - Scalability
UR - http://www.scopus.com/inward/record.url?scp=85076126709&partnerID=8YFLogxK
U2 - 10.1145/3295500.3357157
DO - 10.1145/3295500.3357157
M3 - Conference contribution
AN - SCOPUS:85076126709
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2019
PB - IEEE Computer Society
Y2 - 17 November 2019 through 22 November 2019
ER -