Abstract
Inverse molecular design faces significant challenges due to vast chemical space and complex property requirements. While language models show promise for molecular generation, they struggle with validity, multi-property optimization, and structural constraints. This work presents RLMolLM, a reinforcement learning framework combining Proximal Policy Optimization (PPO) with genetic algorithms to address these limitations. Our approach optimizes multiple user-specified properties including quantitative estimates of drug-likeness (QED), synthetic accessibility (SA), and ADMET (absorption, distribution, metabolism, excretion, and toxicity) endpoints without requiring complete model retraining, while maintaining capability for scaffold-constrained generation where specific substructures must be preserved. We outperform state-of-the-art methods for molecular optimization, achieving best QED scores across GDB13, Moses, and Zinc datasets with up to 31% improvement over previous methods while maintaining excellent validity, uniqueness, and novelty metrics. For simultaneous multi-property optimization, our framework achieves substantial improvements in ADMET properties including 4.5-fold reduction in hERG toxicity and enhanced Caco-2 permeability compared to Moses dataset. Under structural constraints, the framework significantly improves molecular validity while preserving scaffolds and effectively optimizing properties. This versatile solution advances pharmaceutical and materials molecular design through effective integration of reinforcement learning and genetic algorithms with multi-property optimization and scaffold preservation.
| Original language | English |
|---|---|
| Pages (from-to) | 12292-12304 |
| Number of pages | 13 |
| Journal | Journal of Chemical Information and Modeling |
| Volume | 65 |
| Issue number | 22 |
| DOIs | |
| State | Published - Nov 24 2025 |
Funding
Research was sponsored by the US Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division [FWP# ERKCK60], under contract DE AC05-00OR22725 with UT-Battelle, LLC. Research was also sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy. This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05–00OR22725 with the US Department of Energy (DOE). The US Government retains, and the publisher, by accepting the article for publication, acknowledges that the US Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US Government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).