TY - JOUR
T1 - A High-Efficiency Delayed Update Algorithm for Evaluating Slater Determinants in Quantum Monte Carlo
AU - Luo, Ye
AU - Kim, Jeongnim
AU - Kent, Paul R.C.
PY - 2025/12/9
Y1 - 2025/12/9
N2 - For quantum Monte Carlo simulations of molecular systems or supercells with thousands of electrons, matrix operations related to Slater determinants lead the computational cost. McDaniel et al. [J. Chem. Phys.2017,147, 174107] proposed a delayed update algorithm to increase computational efficiency by using matrix-matrix multiplication when updating the inverse matrices of Slater determinants. However, preparing intermediate matrices for applying the Sherman-Morrison-Woodbury formula remained a bottleneck. In this work, we introduce an improved algorithm for CPUs and GPUs that (1) reduces this bottleneck by iteratively updating the intermediate matrices and (2) is efficient at any acceptance ratio, with no cost for rejected moves on CPUs and minimal cost on GPUs. We show the full scheme of integrating the delayed update algorithm into a single-electron move. The high efficiency of our algorithm is demonstrated on CPUs and GPUs for a 512 atom/6144 valence electron calculation, with 12× and 2× overall speed-up compared to traditional rank-1 update schemes in diffusion quantum Monte Carlo, respectively.
AB - For quantum Monte Carlo simulations of molecular systems or supercells with thousands of electrons, matrix operations related to Slater determinants lead the computational cost. McDaniel et al. [J. Chem. Phys.2017,147, 174107] proposed a delayed update algorithm to increase computational efficiency by using matrix-matrix multiplication when updating the inverse matrices of Slater determinants. However, preparing intermediate matrices for applying the Sherman-Morrison-Woodbury formula remained a bottleneck. In this work, we introduce an improved algorithm for CPUs and GPUs that (1) reduces this bottleneck by iteratively updating the intermediate matrices and (2) is efficient at any acceptance ratio, with no cost for rejected moves on CPUs and minimal cost on GPUs. We show the full scheme of integrating the delayed update algorithm into a single-electron move. The high efficiency of our algorithm is demonstrated on CPUs and GPUs for a 512 atom/6144 valence electron calculation, with 12× and 2× overall speed-up compared to traditional rank-1 update schemes in diffusion quantum Monte Carlo, respectively.
UR - https://www.scopus.com/pages/publications/105024259278
U2 - 10.1021/acs.jctc.5c01541
DO - 10.1021/acs.jctc.5c01541
M3 - Article
C2 - 41325396
AN - SCOPUS:105024259278
SN - 1549-9618
VL - 21
SP - 12064
EP - 12070
JO - Journal of Chemical Theory and Computation
JF - Journal of Chemical Theory and Computation
IS - 23
ER -