Using GANs with adaptive training data to search for new molecules

Research output: Contribution to journalArticlepeer-review

37 Scopus citations

Abstract

The process of drug discovery involves a search over the space of all possible chemical compounds. Generative Adversarial Networks (GANs) provide a valuable tool towards exploring chemical space and optimizing known compounds for a desired functionality. Standard approaches to training GANs, however, can result in mode collapse, in which the generator primarily produces samples closely related to a small subset of the training data. In contrast, the search for novel compounds necessitates exploration beyond the original data. Here, we present an approach to training GANs that promotes incremental exploration and limits the impacts of mode collapse using concepts from Genetic Algorithms. In our approach, valid samples from the generator are used to replace samples from the training data. We consider both random and guided selection along with recombination during replacement. By tracking the number of novel compounds produced during training, we show that updates to the training data drastically outperform the traditional approach, increasing potential applications for GANs in drug discovery.

Original languageEnglish
Article number14
JournalJournal of Cheminformatics
Volume13
Issue number1
DOIs
StatePublished - Dec 2021

Funding

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan) The research was supported by the U.S. Department of Energy, Office of Science, through the Office of Advanced Scientific Computing Research (ASCR), under contract number DE-AC05-00OR22725; the Exascale Computing Project (ECP) (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration; and in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. It was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, Oak Ridge National Laboratory under Contract DE-AC05-00OR22725, and Frederick National Laboratory for Cancer Research under Contract HHSN261200800001E. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan )

FundersFunder number
DOE Public Access Plan
National Institutes of Health
U.S. Department of Energy
National Cancer Institute
Office of Science
National Nuclear Security Administration
Advanced Scientific Computing Research17-SC-20-SC, DE-AC05-00OR22725
Argonne National LaboratoryDE-AC02-06-CH11357
Lawrence Livermore National LaboratoryDE-AC52-07NA27344
Oak Ridge National Laboratory
Los Alamos National LaboratoryDE-AC5206NA25396
Frederick National Laboratory for Cancer ResearchHHSN261200800001E
UT-Battelle

    Keywords

    • Drug discovery
    • Generative Adversarial Network
    • Search

    Fingerprint

    Dive into the research topics of 'Using GANs with adaptive training data to search for new molecules'. Together they form a unique fingerprint.

    Cite this