CASTELO: clustered atom subtypes aided lead optimization—a combined machine learning and molecular modeling method

Leili Zhang, Giacomo Domeniconi, Chih Chieh Yang, Seung gu Kang, Ruhong Zhou, Guojing Cong

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Background: Drug discovery is a multi-stage process that comprises two costly major steps: pre-clinical research and clinical trials. Among its stages, lead optimization easily consumes more than half of the pre-clinical budget. We propose a combined machine learning and molecular modeling approach that partially automates lead optimization workflow in silico, providing suggestions for modification hot spots. Results: The initial data collection is achieved with physics-based molecular dynamics simulation. Contact matrices are calculated as the preliminary features extracted from the simulations. To take advantage of the temporal information from the simulations, we enhanced contact matrices data with temporal dynamism representation, which are then modeled with unsupervised convolutional variational autoencoder (CVAE). Finally, conventional and CVAE-based clustering methods are compared with metrics to rank the submolecular structures and propose potential candidates for lead optimization. Conclusion: With no need for extensive structure-activity data, our method provides new hints for drug modification hotspots which can be used to improve drug potency and reduce the lead optimization time. It can potentially become a valuable tool for medicinal chemists.

Original languageEnglish
Article number338
JournalBMC Bioinformatics
Volume22
Issue number1
DOIs
StatePublished - Dec 2021

Funding

We thank Wendy Cornell for suggestions and discussions on the topic of drug discovery. We thank Josef Klucik and Paul Winget for the discussions on the subject of the sweeteners. R.Z. and G.C. gratefully acknowledge the financial support from the IBM Bluegene Science Program (W125859, W1464125 and W1464164), Computing Cloud Clusters and Witherspoon supercomputer in IBM.

FundersFunder number
IBM BlueGene Science ProgramW1464164, W1464125, W125859
International Business Machines Corporation

    Keywords

    • Clustering
    • Drug discovery
    • Lead optimization
    • Machine learning
    • Molecular dynamics simulation
    • Variational autoencoder

    Fingerprint

    Dive into the research topics of 'CASTELO: clustered atom subtypes aided lead optimization—a combined machine learning and molecular modeling method'. Together they form a unique fingerprint.

    Cite this