Abstract
Water molecules play a significant role in maintaining protein structural stability and facilitating molecular interactions. Accurate prediction of water molecule positions around protein structures is essential for understanding their biological roles and has significant implications for protein engineering and drug discovery. Here, we introduce SuperWater, a novel generative AI framework that integrates a score-based diffusion model with equivariant graph neural networks to predict water molecule placements around proteins with high accuracy. SuperWater surpasses existing methods, delivering state-of-the-art performance in both crystal water coverage and prediction precision, achieving water localization within 0.3 ± 0.06 Å of experimentally validated positions. We demonstrate the capabilities of SuperWater through case studies involving protein hydration, protein-ligand binding, and protein-protein binding sites. This framework can be adapted for various applications, including structural biology, binding site prediction, multi-body docking, and water-mediated drug design.
| Original language | English |
|---|---|
| Article number | 397 |
| Journal | Communications Chemistry |
| Volume | 8 |
| Issue number | 1 |
| DOIs | |
| State | Published - Dec 2025 |
| Externally published | Yes |
Funding
Z.S. thanks the support of the Vanderbilt Data Science Postdoctoral Fellowship. X.L. and X.K. are grateful for the research funding and support provided by the Vanderbilt Data Science Institute. Y.L acknowledges the Nvidia hardware grant for accelerating the project development. X.L. also thanks the John R. Hall Professorship Endowment in Chemical Engineering for its support. J.L. expresses gratitude for the project opportunity provided by the Vanderbilt Data Science Institute. We sincerely thank Umang Chaudhry for facilitating access to these resources. The authors thank Tommi Jaakkola and Gabriele Corso at Massachusetts Institute of Technology for the help and guidance. We also acknowledge the computational resources (DGX A100) provided by the Vanderbilt Data Science Institute. J.M. is supported by a Humboldt Professorship of the Alexander von Humboldt Foundation. J.M. acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG) through SFB1423 (421152132), SFB 1664 (514901783), TRR (514664767), and SPP 2363 (460865652). J.M. is supported by the Federal Ministry of Education and Research (BMBF) through the Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), through the German Network for Bioinformatics Infrastructure (de.NBI), and through the German Academic Exchange Service (DAAD) via the School of Embedded Composite AI (SECAI 15766814). Work in the Meiler laboratory is further supported through the National Institute of Health (NIH) through R01 HL122010, R01 DA046138, R01 AG068623, U01 AI150739, R01 CA227833, R01 LM013434, S10 OD016216, S10 OD020154, S10 OD032234.