The program uses the Needleman-Wunsch algorithm to align sequences to structures. Structures are represented as strings of solvent accessibility. Sequences are represented as strings of hydrophobicity. The elements for both the accessibiliy and hydrophobicity strings are assigned one of three values (e.g, high, medium and low accessibility; hydrophobic, in-between, hydrophilic). The cutoffs for assigning each residue to the appropriate class were determined empirically by maximizing the ability of globin sequences to identify globin structures. The scoring matrix and default gap penalties were determined in the same way.
The sequence strings used in these searches are actually a kind of consensus sequence. In the simplest case, at every residue position in a set of aligned homologous sequences we ask what the most hydrophilic amino acid is at that position and use that amino acid in assigning the hydrophobicity class for that residue position. Since evolutionary replacement of a surface hydrophobic residue by a polar residue is more likely than the same substitution at a buried position, the most hydrophilic residue at a particular position in a sequence alignment should correlate better with solvent accessibility than would a particular residue in a single sequence. In practice, we found that throwing out one or a few of the most hydrophilic residues and then using the next most hydrophilic worked best. This is because (1) the sequence alignments can be ambiguous and (2) even in well aligned sequences subtle structural differences can allow a polar residue in one structure and not in another. Also we found that handling Arg and Lys in a special way helped things because of the amphipathic nature of these sidechains (see paper for details).
BphC: Six homologues of BphC were aligned, partly from published alignments and partly by eye. The program was run with various number of residue rejections (see above for how the consensus sequence is derived) and with a couple of different sets of gap penalties but the results consistently indicated a good match with catalase. For example, two rejections with the default gap penalties gave a Z score of greater than 5 for catalase. All other structures were between -3 and 2. With BphC I actually tried very hard to get the program to give a good alignment with glycolate oxidase (GO) rather than to catalase because circumstantial evidence makes me suspect that the protein actually looks more like GO. However, although GO consistently had a high raw score, the Z score was never as high as that of catalase.
Synaptotagmin: The PKC domain is found twice in synaptotagmins. I
aligned the sequences of both domains from synaptotagmins of four
species, for a total of eight sequences. Here again the search was
performed using 1,2,3 or 4 rejected residues at each position. Unlike
BphC, no one structure jumped out as being clearly better using any
single set of rejections/gap penalties. However, when a consensus
rating was generated by looking at the average ranking of each
structure in the several different searches, it was found that two
structures seemed significantly better than all others. The two were
tobacco bushy stunt virus (chain C) and hemagluttinin. I exercised
some subjective judgement in predicting the similarity to hemaglutinin
because the biological function of the two proteins are related.
Asilomar Conference home page