|
Research Article
Enriching the sequence
substitution matrix by structural information |
Octavian Teodorescu 1, Tamara
Galor 1, Jaroslaw Pillardy
2, Ron Elber 1 * |
1Department of Computer
Science, Cornell University, Upson Hall 4130, Ithaca, New York 14853
2Cornell Theory Center, Cornell
University, Upson Hall 4130, Ithaca, New York 14853
|
email: Ron Elber (ron@cs.cornell.edu) |
*Correspondence to Ron Elber,
Department of Computer Science, Cornell University, Upson Hall 4130,
Ithaca, NY 41583
The
calculations were performed on Dell Edge cluster of the Cornell Theory
Center funded by the tri-institutional grant.
Funded by:
National Science
Foundation; Grant Number: 9988519
NSERC Canadian
fellowship
Cornell and
Rockefeller Universities and Memorial Sloan Kettering Cancer Center
sequence alignment • threading • fitness function •
sequence-to-structure matching • energy function • Z-score |
A fundamental step in homology modeling is the comparison of two
protein sequences: a probe sequence with an unknown structure and
function and a template sequence for which the structure and
function are known. The detection of protein similarities relies on
a substitution matrix that scores the proximity of the aligned amino
acids. Sequence-to-sequence alignments use symmetric substitution
matrices, whereas the threading protocols use asymmetric matrices,
testing the fitness of the probe sequence into the structure of the
template protein. We propose a linear combination of threading and
sequence-alignment scoring function, to produce a single (mixed)
scoring table. By fitting a single parameter (which is the relative
contribution of the BLOSUM 50 matrix and the threading energy table
of THOM2) we obtain a significant increase in prediction capacity in
the twilight zone of homology modeling (detecting sequences with
<25% sequence identity and with very similar structures). For a
difficult test of 176 homologous pairs, with no signal of sequence
similarity, the mixed model makes it possible to detect between 40
and 100% more protein pairs than the number of pairs that are
detected by pure threading. Surprisingly, the linear combination of
the two models is performing better than threading and than sequence
alignment when the percentage of sequence identity is low. We
finally suggest that further enrichment of substitution matrices,
combing more structural descriptors such as exposed surface area, or
secondary structure is expected to enhance the signal as well.
Proteins 2003. © 2003 Wiley-Liss, Inc. |
Received: 11 December 2002; Accepted: 25 March 2003
10.1002/prot.10474 About
DOI
|
|