TY - JOUR
T1 - Using physical potentials and learned models to distinguish native binding interfaces from de novo designed interfaces that do not bind
AU - Demerdash, Omar N.A.
AU - Mitchell, Julie C.
PY - 2013/11
Y1 - 2013/11
N2 - Protein-protein interactions are a fundamental aspect of many biological processes. The advent of recombinant protein and computational techniques has allowed for the rational design of proteins with novel binding capabilities. It is therefore desirable to predict which designed proteins are capable of binding in vitro. To this end, we have developed a learned classification model that combines energetic and non-energetic features. Our feature set is adapted from specialized potentials for aromatic interactions, hydrogen bonds, electrostatics, shape, and desolvation. A binding model built on these features was initially developed for CAPRI Round 21, achieving top results in the independent assessment. Here, we present a more thoroughly trained and validated model, and compare various support-vector machine kernels. The Gaussian kernel model classified both high-resolution complexes and designed nonbinders with 79-86% accuracy on independent test data. We also observe that multiple physical potentials for dielectric-dependent electrostatics and hydrogen bonding contribute to the enhanced predictive accuracy, suggesting that their combined information is much greater than that of any single energetics model. We also study the change in predictive performance as the model features or training data are varied, observing unusual patterns of prediction in designed interfaces as compared with other data types.
AB - Protein-protein interactions are a fundamental aspect of many biological processes. The advent of recombinant protein and computational techniques has allowed for the rational design of proteins with novel binding capabilities. It is therefore desirable to predict which designed proteins are capable of binding in vitro. To this end, we have developed a learned classification model that combines energetic and non-energetic features. Our feature set is adapted from specialized potentials for aromatic interactions, hydrogen bonds, electrostatics, shape, and desolvation. A binding model built on these features was initially developed for CAPRI Round 21, achieving top results in the independent assessment. Here, we present a more thoroughly trained and validated model, and compare various support-vector machine kernels. The Gaussian kernel model classified both high-resolution complexes and designed nonbinders with 79-86% accuracy on independent test data. We also observe that multiple physical potentials for dielectric-dependent electrostatics and hydrogen bonding contribute to the enhanced predictive accuracy, suggesting that their combined information is much greater than that of any single energetics model. We also study the change in predictive performance as the model features or training data are varied, observing unusual patterns of prediction in designed interfaces as compared with other data types.
KW - Machine learning
KW - Protein binding
KW - Protein complex
KW - Protein design
KW - Stacking interactions
UR - http://www.scopus.com/inward/record.url?scp=84885809230&partnerID=8YFLogxK
U2 - 10.1002/prot.24337
DO - 10.1002/prot.24337
M3 - Article
C2 - 23760773
AN - SCOPUS:84885809230
SN - 0887-3585
VL - 81
SP - 1919
EP - 1930
JO - Proteins: Structure, Function and Genetics
JF - Proteins: Structure, Function and Genetics
IS - 11
ER -