## Abstract

We present DFT-FE 1.0, building on DFT-FE 0.6 Motamarri et al. (2020) [28], to conduct fast and accurate large-scale density functional theory (DFT) calculations (reaching ∼100,000 electrons) on both many-core CPU and hybrid CPU-GPU computing architectures. This work involves improvements in the real-space formulation—via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency—as well high-performance computing aspects, including the GPU acceleration of all the key compute kernels in DFT-FE. We demonstrate the accuracy by comparing the ground-state energies, ionic forces and cell stresses on a wide-range of benchmark systems against those obtained from widely used DFT codes. Further, we demonstrate the numerical efficiency of our GPU acceleration, which yields ∼20× speed-up on hybrid CPU-GPU nodes of the Summit supercomputer. Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of 80−140 seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing ∼6,000−15,000 electrons using 64−224 nodes of the Summit supercomputer. Program summary: Program Title: DFT-FE CPC Library link to program files: https://doi.org/10.17632/c5ghfc6ctn.1 Developer's repository link: https://github.com/dftfeDevelopers/dftfe Licensing provisions: LGPL v3 Programming language: C/C++ External routines/libraries: p4est (http://www.p4est.org/), deal.II (https://www.dealii.org/), BLAS (http://www.netlib.org/blas/), LAPACK (http://www.netlib.org/lapack/), ELPA (https://elpa.mpcdf.mpg.de/), ScaLAPACK (http://www.netlib.org/scalapack/), Spglib (https://atztogo.github.io/spglib/), ALGLIB (http://www.alglib.net/), LIBXC (http://www.tddft.org/programs/libxc/), PETSc (https://www.mcs.anl.gov/petsc), SLEPc (http://slepc.upv.es), NCCL (optional-https://github.com/NVIDIA/nccl). Nature of problem: Density functional theory calculations. Solution method: We employ a local real-space variational formulation of Kohn-Sham density functional theory that is applicable for both pseudopotential and all-electron calculations on periodic, semi-periodic and non-periodic geometries. Higher-order adaptive spectral finite-element basis is used to discretize the Kohn-Sham equations. Chebyshev polynomial filtered subspace iteration procedure (ChFSI) is employed to solve the nonlinear Kohn-Sham eigenvalue problem self-consistently. ChFSI in DFT-FE employs Cholesky factorization based orthonormalization, and spectrum splitting based Rayleigh-Ritz procedure in conjunction with mixed precision arithmetic. Configurational force approach is used to compute ionic forces and periodic cell stresses for geometry optimization. Additional comments including restrictions and unusual features: Exchange correlation functionals are restricted to Local Density Approximation (LDA) and Generalized Gradient Approximation (GGA), with and without spin. The pseudopotentials available are optimized norm conserving Vanderbilt (ONCV) pseudopotentials and Troullier–Martins (TM) pseudopotentials. Calculations are non-relativistic. DFT-FE handles all-electron and pseudopotential calculations in the same framework, while accommodating periodic, non-periodic and semi-periodic boundary conditions.

Original language | English |
---|---|

Article number | 108473 |

Journal | Computer Physics Communications |

Volume | 280 |

DOIs | |

State | Published - Nov 2022 |

### Funding

Vikram Gavini reports financial support was provided by Toyota Research Institute. We thank B. Kanungo, C.-C. Lin, K. Ramakrishnan, N. Kodali and N. Rufus for independently testing DFT-FE 1.0 , providing useful feedback, and providing reference data for validating the all-electron calculations reported in this work. We gratefully acknowledge the support from the Department of Energy , Office of Basic Energy Sciences (Award number DE-SC0008637 ) and the Toyota Research Institute that funded the development of DFT-FE 1.0 . V.G. gratefully acknowledges the support of Air Force Office of Scientific Research (Grant number FA9550-21-1-0302 ) that supported the work on improved electrostatics treatment. This research used resources of the Oak Ridge Leadership Computing Facility , which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725 . This research used resources of the National Energy Research Scientific Computing Center , a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 . This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant number ACI-1053575 . V.G. gratefully acknowledges the support of the Army Research Office through the DURIP grant W911NF1810242 , which also provided computational resources for this work. P.M. gratefully acknowledges the seed grant from Indian Institute of Science and SERB Startup Research Grant from The Department of Science and Technology India (Grant Number: SRG/2020/002194 ) for the purchase of a GPU cluster, which also provided computational resources for this work.

Funders | Funder number |
---|---|

National Science Foundation | ACI-1053575 |

U.S. Department of Energy | DE-AC02-05CH11231 |

Air Force Office of Scientific Research | FA9550-21-1-0302 |

Army Research Office | W911NF1810242 |

Office of Science | DE-AC05-00OR22725 |

Basic Energy Sciences | DE-SC0008637 |

Indian Institute of Science | |

Toyota Research Institute | |

Science and Technology Department of Ningxia | SRG/2020/002194 |

Science and Engineering Research Board |

## Keywords

- All-electron
- Electronic structure
- GPU
- Mixed-precision arithmetic
- Pseudopotential
- Real-space
- Spectral finite-elements