TY - JOUR
T1 - Enhancing molecular design efficiency
T2 - Uniting language models and generative networks with genetic algorithms
AU - Bhowmik, Debsindhu
AU - Zhang, Pei
AU - Fox, Zachary
AU - Irle, Stephan
AU - Gounley, John
N1 - Publisher Copyright:
© 2024 Oak Ridge National Laboratory
PY - 2024/4/12
Y1 - 2024/4/12
N2 - This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
AB - This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.
KW - generative adversarial network
KW - genetic algorithm
KW - masked language model
KW - molecule design
UR - http://www.scopus.com/inward/record.url?scp=85189036306&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2024.100947
DO - 10.1016/j.patter.2024.100947
M3 - Article
AN - SCOPUS:85189036306
SN - 2666-3899
VL - 5
JO - Patterns
JF - Patterns
IS - 4
M1 - 100947
ER -