Article Abstract

Support vector training of protein alignment models

Author(s): Yu, Chun-Nam John (cnyu@cs.cornell.edu); Joachims, Thorsten (tj@cs.cornell.edu); Elber, Ron (ron@cs.cornell.edu); Pillardy, Jaroslaw (jarekp@tc.cornell.edu)

Editor(s): Speed, T; Huang, H

Source: Research in Computational Molecular Biology, Proceedings Pages: 253-267 Published: 2007

Series: LECTURE NOTES IN COMPUTER SCIENCE : 4453

Abstract: Sequence to structure alignment is an important step in homology modeling of protein structures. Incorporation of features like secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this paper, we overcome this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred-thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM aligns 47% of the residues correctly and aligns over 70% of the residues within a shift of 4 positions.