Sander Group
EMBL Heidelberg, Europe
Prediction of 3D structure for the Asilomar contest. Dec. 4-8, 1994
Secondary structure was predicted for all proteins using the neural
network method that uses sequence profiles as input (Rost and Sander,
1993, 1994).
The method employed for 3D prediction for the contest makes use of
broad evolutionary, biochemical and structural knowledge and uses
hydrophobicity as the main numerical criterion by which to optimize
the alignment of the model sequence to a known 3-D structure. We
typically built only one model using a template singled out using
intuition.
- Step 1: collect a sequence family alignment using MaxHom (Sander
and Schneider, 1990). If possible, families are extended by pattern
searches (Rohde and Bork, 1993, CABIOS 9, 183-189). In one prediction
case (cytidylyltransferase), a motif similarity to a protein already in
the PDB was identified. At least two other remote homologies were
missed: urease with adenosine deaminase and beta-galactosidase with
beta-amylase, both detected by Dali structure comparison (Holm and
Sander, JMB, 1993).
- Step 2: secondary structure prediction after multiple sequence
alignment using the PHD program (Rost and Sander JMB 1993, 232,
584-599, Proteins 1994, 19,55-72) and by looking at conserved
hydrophobicity patterns. If the reliability index of the PHD
prediction was low, we occasionally allowed ourselves to change the
secondary structure assignment.
- Step 3: figure out a compact 3-D fold for the secondary structure
elements by drawing diagrams on paper. Very hydrophobic elements
should be interior and invariant residues which are likely to cluster
around the active site should be close in space. If a protein has beta
strands, they must form sheets. There are a few favoured folding
motifs (Holm and Sander 1993, JMB 233, 123-138) for alpha/beta
proteins. Helices may associate as bundles or polyhedra. In the
prediction game we proceeded to 3-D model building with two presumed
TIM barrel proteins.
In the second case (beta-galactosidase) we listed a number of fuzzy
reasons that favoured a TIM barrel topology based on sequence
conservation in the protein family. The reasons were
- (i) high frequency of TIM barrel topologies among glycosyl
hydrolases,
- (ii) large size of the protein,
- (iii) typical helix-Gxx-strand pattern which is observed also
between TIM barrels that are unrelated in evolution,
- (iv) conserved residues clustered at the C-terminal side of the
strands (all TIM barrels have the active site on this side),
- (v) short loops at the bottom and long loops at the top (side of
active site) is a characteristic seen in the structure comparison of
TIM barrels.
- Step 4: generate a 3-D model by substituting the side chains but
keeping the backbone of a known structural template (program MaxSprout,
Holm and Sander, JMB 1991, 218, 183-194). The sequence-structure
alignment was made iteratively. The initial alignment tries to
conserve hydrophobic patches between template and the modelled sequence
family (sequence is better conserved in the hydrophobic core than on
the protein surface). The model is then evaluated using atomic
solvation preference (Holm and Sander, JMB 1992, 225, 93-105).
Solvation preference profiles suggest places where the alignment should
be improved. Native proteins and deliberately misfolded models have a
very large gap in solvation preference. Credible models have solvation
preferences closer to native proteins than to misfolded ones. The
alignment is iteratively improved to optimize solvation preference.
Loops /insertions are excluded from the 3-D model and backbone remains
fixed. The sidechain optimization in rotamer space also tries to
minimize clashes in the core. Few clashes is also an indication of a
credible model.
In the first case (xylanase) we obtained a good 3-D model on a TIM
barrel template as evaluated by solvation preference criteria but also
saw some implausible features in the model and did not believe it.
Having screwed up the first case, we put more weight on the reasonable
solvation preference in the second prediction (beta-galactosidase)
although this model also had some errors by visual inspection, and were
right to do so.
Our sequence-structure fitness program for threading (FosFos = fitness
of sequence for structure; Ouzounis et al. JMB 1993) was not used for
the contest as we do not consider it sufficiently reliable in its
present form.
Contributions from Liisa Holm, Burkhard Rost, Peer Bork and Chris
Sander. Abstract by Liisa Holm, edited by Chris Sander.
Asilomar Conference home page