TY - JOUR
T1 - The GPU-enabled divide-expand-consolidate RI-MP2 method (DEC-RI-MP2)
AU - Bykov, Dmytro
AU - Kjaergaard, Thomas
N1 - Publisher Copyright:
© 2016 Wiley Periodicals, Inc.
PY - 2017/2/5
Y1 - 2017/2/5
N2 - We report porting of the Divide-Expand-Consolidate Resolution of the Identity second-order Møller–Plesset perturbation (DEC-RI-MP2) method to the graphic processing units (GPUs) using OpenACC compiler directives. It is shown that the OpenACC compiler directives implementation efficiently accelerates the rate-determining step of the DEC-RI-MP2 method with minor implementation effort. Moreover, the GPU acceleration results in a better load balance and thus in an overall scaling improvement of the DEC algorithm. The resulting cross-platform hybrid MPI/OpenMP/OpenACC implementation has scalable and portable performance on heterogeneous HPC architectures. The GPU-enabled code was benchmarked using a reduced version of the S12L test set of Stefan Grimme (Grimme, Chem. Eur. J. 2012, 18, 9955) consisting of supramolecular complexes up to 158 atoms and 4292 contracted basis functions (cc-pVTZ). The test set results demonstrate the general applicability of the DEC-RI-MP2 method showing results consistent with the DEC-RI-MP2 introductory paper (Baudin et al., J. Chem. Phys. 2016, 144, 054102) on molecules of complicated electronic structures.
AB - We report porting of the Divide-Expand-Consolidate Resolution of the Identity second-order Møller–Plesset perturbation (DEC-RI-MP2) method to the graphic processing units (GPUs) using OpenACC compiler directives. It is shown that the OpenACC compiler directives implementation efficiently accelerates the rate-determining step of the DEC-RI-MP2 method with minor implementation effort. Moreover, the GPU acceleration results in a better load balance and thus in an overall scaling improvement of the DEC algorithm. The resulting cross-platform hybrid MPI/OpenMP/OpenACC implementation has scalable and portable performance on heterogeneous HPC architectures. The GPU-enabled code was benchmarked using a reduced version of the S12L test set of Stefan Grimme (Grimme, Chem. Eur. J. 2012, 18, 9955) consisting of supramolecular complexes up to 158 atoms and 4292 contracted basis functions (cc-pVTZ). The test set results demonstrate the general applicability of the DEC-RI-MP2 method showing results consistent with the DEC-RI-MP2 introductory paper (Baudin et al., J. Chem. Phys. 2016, 144, 054102) on molecules of complicated electronic structures.
KW - MP2
KW - graphic processing units
KW - heterogeneous architectures
KW - parallel Implementation
UR - http://www.scopus.com/inward/record.url?scp=85006493413&partnerID=8YFLogxK
U2 - 10.1002/jcc.24678
DO - 10.1002/jcc.24678
M3 - Article
C2 - 27925252
AN - SCOPUS:85006493413
SN - 0192-8651
VL - 38
SP - 228
EP - 237
JO - Journal of Computational Chemistry
JF - Journal of Computational Chemistry
IS - 4
ER -