TY - GEN
T1 - Scaling Ensembles of Data-Intensive Quantum Chemical Calculations for Millions of Molecules
AU - Mehta, Kshitij
AU - Pasini, Massimiliano Lupo
AU - Irle, Stephan
AU - Yoo, Pilsun
AU - Suter, Frederic
AU - Ganyushin, Dmitry
AU - Klasky, Scott
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Deep learning models are efficient computational tools that can accelerate the inverse design of molecules with desired functional properties by generating predictions at a fraction of the time required by traditional quantum chemical approaches. To ensure that a model maintains accuracy and transferability across broad regions of the chemical space explored during the inverse design, it must be trained on massively large volumes of simulation data. This requires running large-scale ensemble quantum chemical calculations on high-performance computing (HPC) systems for data collection. However, the efficient execution of such large ensemble calculations and the management of large volumes of output data require tools that can judiciously utilize computational resources and manage metadata overhead on the file system. Therefore, we present a high-performance, scalable, ensemble management framework for performing data-intensive quantum chemical electronic structure calculations for organic molecules. This framework provides abstractions to plug different ab initio, first principles, and first principles-based semi-empirical methods and executes them efficiently at large scale on HPC systems. It dynamically distributes tasks to resources and uses tiered storage for managing large collections of files. We employed this framework to process over ten million organic molecules and generate open-source datasets that provide UV-vis absorption spectra by running time-dependent density-functional tight-binding calculations. It is the largest database containing molecular optical spectra that were simulated with quantum chemical methods in a consistent manner.
AB - Deep learning models are efficient computational tools that can accelerate the inverse design of molecules with desired functional properties by generating predictions at a fraction of the time required by traditional quantum chemical approaches. To ensure that a model maintains accuracy and transferability across broad regions of the chemical space explored during the inverse design, it must be trained on massively large volumes of simulation data. This requires running large-scale ensemble quantum chemical calculations on high-performance computing (HPC) systems for data collection. However, the efficient execution of such large ensemble calculations and the management of large volumes of output data require tools that can judiciously utilize computational resources and manage metadata overhead on the file system. Therefore, we present a high-performance, scalable, ensemble management framework for performing data-intensive quantum chemical electronic structure calculations for organic molecules. This framework provides abstractions to plug different ab initio, first principles, and first principles-based semi-empirical methods and executes them efficiently at large scale on HPC systems. It dynamically distributes tasks to resources and uses tiered storage for managing large collections of files. We employed this framework to process over ten million organic molecules and generate open-source datasets that provide UV-vis absorption spectra by running time-dependent density-functional tight-binding calculations. It is the largest database containing molecular optical spectra that were simulated with quantum chemical methods in a consistent manner.
KW - first principles
KW - hpc
KW - quantum chemistry
KW - workflows
UR - http://www.scopus.com/inward/record.url?scp=85200746197&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW63119.2024.00175
DO - 10.1109/IPDPSW63119.2024.00175
M3 - Conference contribution
AN - SCOPUS:85200746197
T3 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
SP - 1047
EP - 1056
BT - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024
Y2 - 27 May 2024 through 31 May 2024
ER -