Abstract
The inverse design of novel molecules with a desirable optoelectronic property requires consideration of the vast chemical spaces associated with varying chemical composition and molecular size. First principles-based property predictions have become increasingly helpful for assisting the selection of promising candidate chemical species for subsequent experimental validation. However, a brute-force computational screening of the entire chemical space is decidedly impossible. To alleviate the computational burden and accelerate rational molecular design, we here present an iterative deep learning workflow that combines (i) the density-functional tight-binding method for dynamic generation of property training data, (ii) a graph convolutional neural network surrogate model for rapid and reliable predictions of chemical and physical properties, and (iii) a masked language model. As proof of principle, we employ our workflow in the iterative generation of novel molecules with a target energy gap between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO).
Original language | English |
---|---|
Article number | 20031 |
Journal | Scientific Reports |
Volume | 13 |
Issue number | 1 |
DOIs | |
State | Published - Dec 2023 |
Funding
We thank Andrew Blanchard (Amgen) for early ideas and code development, and Belinda Akpa (ORNL) for helpful discussions. This work was supported by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. This material is also based upon work supported by the Office of Advanced Scientific Computing Research, Office of Science, and the Scientific Discovery through Advanced Computing (SciDAC) program. This work used resources of the Oak Ridge Leadership Computing Facility, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Support for DOI https://doi.org/10.13139/ORNLNCCS/1996925 dataset is provided by the U.S. Department of Energy, project BIF136 under Contract DE-AC05-00OR22725. Project BIF136 used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 We thank Andrew Blanchard (Amgen) for early ideas and code development, and Belinda Akpa (ORNL) for helpful discussions. This work was supported by the Artificial Intelligence Initiative as part of the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. This material is also based upon work supported by the Office of Advanced Scientific Computing Research, Office of Science, and the Scientific Discovery through Advanced Computing (SciDAC) program. This work used resources of the Oak Ridge Leadership Computing Facility, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Support for DOI https://doi.org/10.13139/ORNLNCCS/1996925 dataset is provided by the U.S. Department of Energy, project BIF136 under Contract DE-AC05-00OR22725. Project BIF136 used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725
Funders | Funder number |
---|---|
Andrew Blanchard | |
Artificial Intelligence Initiative | |
Belinda Akpa | |
U.S. Department of Energy | BIF136, DE-AC05-00OR22725 |
Amgen | |
Office of Science | |
Advanced Scientific Computing Research | |
Oak Ridge National Laboratory |