TY - GEN
T1 - Learning protein folding energy functions
AU - Guan, Wei
AU - Ozakin, Arkadas
AU - Gray, Alexander
AU - Borreguero, Jose
AU - Pandit, Shashi
AU - Jagielska, Anna
AU - Wroblewska, Liliana
AU - Skolnick, Jeffrey
PY - 2011
Y1 - 2011
N2 - A critical open problem in ab initio protein folding is protein energy function design, which pertains to defining the energy of protein conformations in a way that makes folding most efficient and reliable. In this paper, we address this issue as a weight optimization problem and utilize a machine learning approach, learning-to-rank, to solve this problem. We investigate the ranking-via-classification approach, especially the RankingSVM method and compare it with the state-of-theart approach to the problem using the MINUIT optimization package. To maintain the physicality of the results, we impose non-negativity constraints on the weights. For this we develop two efficient non-negative support vector machine (NNSVM) methods, derived from L2-norm SVM and L1-norm SVMs, respectively. We demonstrate an energy function which maintains the correct ordering with respect to structure dissimilarity to the native state more often, is more efficient and reliable for learning on large protein sets, and is qualitatively superior to the current state-of-the-art energy function.
AB - A critical open problem in ab initio protein folding is protein energy function design, which pertains to defining the energy of protein conformations in a way that makes folding most efficient and reliable. In this paper, we address this issue as a weight optimization problem and utilize a machine learning approach, learning-to-rank, to solve this problem. We investigate the ranking-via-classification approach, especially the RankingSVM method and compare it with the state-of-theart approach to the problem using the MINUIT optimization package. To maintain the physicality of the results, we impose non-negativity constraints on the weights. For this we develop two efficient non-negative support vector machine (NNSVM) methods, derived from L2-norm SVM and L1-norm SVMs, respectively. We demonstrate an energy function which maintains the correct ordering with respect to structure dissimilarity to the native state more often, is more efficient and reliable for learning on large protein sets, and is qualitatively superior to the current state-of-the-art energy function.
KW - Ab initio protein folding
KW - Energy function
KW - Learningto-rank
KW - Non-negativity constrained SVM optimization
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=84857173991&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2011.88
DO - 10.1109/ICDM.2011.88
M3 - Conference contribution
AN - SCOPUS:84857173991
SN - 9780769544083
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1062
EP - 1067
BT - Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
T2 - 11th IEEE International Conference on Data Mining, ICDM 2011
Y2 - 11 December 2011 through 14 December 2011
ER -