Author(s): Yu CNJ (Yu, Chun-Nam John)1, Joachims T (Joachims, Thorsten)1, Elber R (Elber, Ron)1, Pillardy J (Pillardy, Jaroslaw)2
Source: JOURNAL OF COMPUTATIONAL BIOLOGY    Volume: 15    Issue: 7    Pages: 867-880    Published: SEP 2008  
Abstract: Sequence to structure alignment is an important step in homology modeling of protein structures. Incorporation of features such as secondary structure, solvent accessibility, or evolutionary information improve sequence to structure alignment accuracy, but conventional generative estimation techniques for alignment models impose independence assumptions that make these features difficult to include in a principled way. In this paper, we overcome this problem using a Support Vector Machine (SVM) method that provides a well-founded way of estimating complex alignment models with hundred of thousands of parameters. Furthermore, we show that the method can be trained using a variety of loss functions. In a rigorous empirical evaluation, the SVM algorithm outperforms the generative alignment method SSALN, a highly accurate generative alignment model that incorporates structural information. The alignment model learned by the SVM aligns 50% of the residues correctly and aligns over 70% of the residues within a shift of four positions.