Proteins: Structure, Function, and Bioinformatics

Volume 54, Issue 1 , Pages 41 - 48

Published Online: 22 Oct 2003

Copyright © 2004 Wiley-Liss, Inc., A Wiley Company

   
 

blank
blank
 Research Article
 
Enriching the sequence substitution matrix by structural information
Octavian Teodorescu 1, Tamara Galor 1, Jaroslaw Pillardy 2, Ron Elber 1 *
1Department of Computer Science, Cornell University, Upson Hall 4130, Ithaca, New York 14853
2Cornell Theory Center, Cornell University, Upson Hall 4130, Ithaca, New York 14853
 
email: Ron Elber (ron@cs.cornell.edu)

*Correspondence to Ron Elber, Department of Computer Science, Cornell University, Upson Hall 4130, Ithaca, NY 41583

The calculations were performed on Dell Edge cluster of the Cornell Theory Center funded by the tri-institutional grant.

Funded by:
 National Science Foundation; Grant Number: 9988519
 NSERC Canadian fellowship
 Cornell and Rockefeller Universities and Memorial Sloan Kettering Cancer Center

 

Keywords
sequence alignment • threading • fitness function • sequence-to-structure matching • energy function • Z-score

 

Abstract
A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence-to-sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence-alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well. Proteins 2003. © 2003 Wiley-Liss, Inc.

Received: 11 December 2002; Accepted: 25 March 2003

 

Digital Object Identifier (DOI)


10.1002/prot.10474  About DOI
 

 

blank