Abstract
Deep learning models are efficient computational tools that can accelerate the inverse design of molecules with desired functional properties by generating predictions at a fraction of the time required by traditional quantum chemical approaches. To ensure that a model maintains accuracy and transferability across broad regions of the chemical space explored during the inverse design, it must be trained on massively large volumes of simulation data. This requires running large-scale ensemble quantum chemical calculations on high-performance computing (HPC) systems for data collection. However, the efficient execution of such large ensemble calculations and the management of large volumes of output data require tools that can judiciously utilize computational resources and manage metadata overhead on the file system. Therefore, we present a high-performance, scalable, ensemble management framework for performing data-intensive quantum chemical electronic structure calculations for organic molecules. This framework provides abstractions to plug different ab initio, first principles, and first principles-based semi-empirical methods and executes them efficiently at large scale on HPC systems. It dynamically distributes tasks to resources and uses tiered storage for managing large collections of files. We employed this framework to process over ten million organic molecules and generate open-source datasets that provide UV-vis absorption spectra by running time-dependent density-functional tight-binding calculations. It is the largest database containing molecular optical spectra that were simulated with quantum chemical methods in a consistent manner.
| Original language | English |
|---|---|
| Title of host publication | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 1047-1056 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798350364606 |
| DOIs | |
| State | Published - 2024 |
| Event | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 - San Francisco, United States Duration: May 27 2024 → May 31 2024 |
Publication series
| Name | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 |
|---|
Conference
| Conference | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2024 |
|---|---|
| Country/Territory | United States |
| City | San Francisco |
| Period | 05/27/24 → 05/31/24 |
Funding
This manuscript has been authored in part by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- first principles
- hpc
- quantum chemistry
- workflows