Abstract
Cryoelectron microscopy requires molecular modeling for refinement of structures. Ensemble models arrive at low free-energy molecular structures, but are computationally expensive and limited to resolving only small proteins. We introduce CryoFold, a pipeline of molecular dynamics simulations that determines ensembles of protein structures by integrating density data of varying sparsity at 3–5 Å resolution with sequence information and coarse-grained topological knowledge of the protein folds. We present six examples, folding proteins between 72 and 2,000 residues, including large membrane and multi-domain systems, and results from two Electron Microscopy Data Bank (EMDB) competitions. Driven by data from a single state, CryoFold discovers ensembles of common low-energy models together with rare low-probability structures that capture the equilibrium distribution of proteins constrained by the density maps. Many of these conformations are experimentally validated and functionally relevant. We arrive at a set of best practices for data-guided protein folding that are controlled using a Python graphical user interface (GUI).
Original language | English |
---|---|
Pages (from-to) | 3195-3216 |
Number of pages | 22 |
Journal | Matter |
Volume | 4 |
Issue number | 10 |
DOIs | |
State | Published - Oct 6 2021 |
Externally published | Yes |
Funding
A.S. and C.G. acknowledge start-up funds from the SMS and Biodesign Center for Applied Structure Discovery at Arizona State University , CAREER award by NSF -MCB 1942763 , and the resources of the OLCF at the Oak Ridge National Laboratory, which is supported by the Office of Science at DOE under contract no. DE-AC05-00OR22725 , made available via the INCITE program. The ET laboratory is supported by NIH ( P41-GM104601 ); E.T., A.S., and M.S. acknowledge NIH ( R01-GM067887 ). This research is part of the Blue Waters Sustained Petascale Computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993 ) and the state of Illinois . K.D. and A.P. appreciate support from a PRAC computer allocation supported by NSF award ACI1514873 , support from NIH grant GM125813 , and the Laufer Center . A.P. appreciates start-up support from the University of Florida . D.K. acknowledges support from the NIH ( R01GM133840 and R01GM123055 ), the National Science Foundation ( MCB1925643 , DMS1614777 , CMMI1825941 , and DBI2003635 ), and the Purdue Institute of Drug Discovery . W.V.H. acknowledges NIH ( R01GM112077 ). A.S. and C.G. acknowledge start-up funds from the SMS and Biodesign Center for Applied Structure Discovery at Arizona State University, CAREER award by NSF-MCB 1942763, and the resources of the OLCF at the Oak Ridge National Laboratory, which is supported by the Office of Science at DOE under contract no. DE-AC05-00OR22725, made available via the INCITE program. The ET laboratory is supported by NIH (P41-GM104601); E.T. A.S. and M.S. acknowledge NIH (R01-GM067887). This research is part of the Blue Waters Sustained Petascale Computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. K.D. and A.P. appreciate support from a PRAC computer allocation supported by NSF award ACI1514873, support from NIH grant GM125813, and the Laufer Center. A.P. appreciates start-up support from the University of Florida. D.K. acknowledges support from the NIH (R01GM133840 and R01GM123055), the National Science Foundation (MCB1925643, DMS1614777, CMMI1825941, and DBI2003635), and the Purdue Institute of Drug Discovery. W.V.H. acknowledges NIH (R01GM112077). M.S. and G.T. integrated the entire CryoFold pipeline, performed simulations and ROSETTA, EM validation, made figures, and wrote the manuscript. C.G. and G.D. designed the GUI. J.V. J.N. and A.M. designed the user guide and reported in the supplemental information. D.S. performed refinement of ATP synthase ensemble and performed refinements for the competition. N.J.S. performed de novo Rosetta validation. P.F. performed SFX experiments on flpp3 protein. W.D.V.H. performed Rosetta refinements and contributed to the supplemental information. E.T. contributed to the main text, and led the molecular dynamics simulation software (NAMD) developments. D.K. led MAINMAST developments. K.D. led MELD developments and wrote the manuscript. A.P. developed the program to combine MAINMAST, MELD, and MDFF, and wrote the manuscript. A.S. conceived the project, performed ReMDFF simulations, oversaw all the teams, and wrote the manuscript. The authors declare no competing interests.
Keywords
- ATP synthase
- CryoEM modeling
- MAP3: Understanding
- computations
- cryoelectron microscopy
- ensemble refinement
- integrative modeling
- molecular dynamics simulations
- protein folding ensemble