Abstract
In this study, we present the DNA-Binding Site Identifier (DBSI), a new structure-based method for predicting protein interaction sites for DNA binding. DBSI was trained and validated on a data set of 263 proteins (TRAIN-263), tested on an independent set of protein-DNA complexes (TEST-206) and data sets of 29 unbound (APO-29) and 30 bound (HOLO-30) protein structures distinct from the training data. We computed 480 candidate features for identifying protein residues that bind DNA, including new features that capture the electrostatic microenvironment within shells near the protein surface. Our iterative feature selection process identified features important in other models, as well as features unique to the DBSI model, such as a banded electrostatic feature with spatial separation comparable with the canonical width of the DNA minor groove. Validations and comparisons with established methods using a range of performance metrics clearly demonstrate the predictive advantage of DBSI, and its comparable performance on unbound (APO-29) and bound (HOLO-30) conformations demonstrates robustness to binding-induced protein conformational changes. Finally, we offer our feature data table to others for integration into their own models or for testing improved feature selection and model training strategies based on DBSI.
Original language | English |
---|---|
Pages (from-to) | e160 |
Journal | Nucleic Acids Research |
Volume | 41 |
Issue number | 16 |
DOIs | |
State | Published - Sep 2013 |
Externally published | Yes |
Funding
National Science Foundation CDI Program [CMMI-0941013] and the US Department of Energy Genomics: GTL and SciDAC Programs [DE-FG02-04ER25627]. Funding for open access charge: National Science Foundation CDI Program [CMMI-0941013].
Funders | Funder number |
---|---|
National Science Foundation | CMMI-0941013 |
U.S. Department of Energy | DE-FG02-04ER25627 |
Directorate for Engineering | 0941013 |