PFRMAT AL 
TARGET T0074 
AUTHOR 9070-5088-8627 
METHOD Overview 
METHOD  
METHOD Fold recognition was performed using the Target98 (SAM-T98) method 
METHOD [3] using SAM version 2.1.1 [1], a refinement of the methods developed 
METHOD by this group for CASP2 [2].  This method attempts to find and multiply  
METHOD align a set of homologs to a given sequence, then create an HMM from that  
METHOD multiple alignment. 
METHOD  
METHOD First, a set of sequence weights is determined from the alignment.  Next,  
METHOD Modelfromalign is used to build the model from the alignment and the  
METHOD sequence weights.  Finally, hmmscore performs a local, all-paths scoring  
METHOD of the sequences, using a reversed-sequence normalization feature. 
METHOD  
METHOD The weighting method, detailed in upcoming publications [3,4], 
METHOD combines the Henikoffs' scheme [5], Dirichlet mixtures [6], and an 
METHOD entropy method to set the final weights. 
METHOD  
METHOD Alignment generation 
METHOD  
METHOD The initial step uses BLASTP to search NRP twice: once to produce a set 
METHOD of very close homologs, and once to produce a set of possible homologs. 
METHOD  
METHOD The method then uses multiple iterations of a selection, training, and  
METHOD alignment procedure.  Each iteration involves an initial alignment, a set  
METHOD of search sequences, a threshold value, and a transition regularizer.  
METHOD  
METHOD The first iteration uses a single sequence (or seed alignment) as the  
METHOD initial alignment and the close homologs found by BLASTP are used as the  
METHOD search set.  The threshold is set very strictly, so that only good matches  
METHOD to the sequence are considered.  This iteration uses a transition regularizer  
METHOD that was designed to match the gap costs used by BLASTP. 
METHOD  
METHOD On subsequent iterations the input alignment is the output from the 
METHOD previous iteration, the search set is the larger set of possible 
METHOD homologs found by BLASTP, and the thresholds are gradually loosened. 
METHOD The second through second-from-last iteration use a ``long-match'' 
METHOD transition regularizer, and the final iteration uses a transition regularizer  
METHOD trained on FSSP alignments. 
METHOD  
METHOD References 
METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996. 
METHOD     http://www.cse.ucsc.edu/research/compbio/sam.html.   
METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. 
METHOD     Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and  
METHOD     Genetics, Suppl. 1, 134-9, 1997. 
METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06, 
METHOD     Department of Computer Engineering, Univ. of California, Santa Cruz, 1998. 
METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, 
METHOD     and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998. 
METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574-578, Nov 1994. 
METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S. 
METHOD    Mian, and D. Haussler, CABIOS 12(4):327-345, 1996. 
METHOD  
METHOD  
METHOD All methods (wu-blast, double-blast, target HMM, library HMMs) scored 
METHOD the EF-hand domains fairly well, and the submission included the 
METHOD information that this contained a calcium-binding site, so we were 
METHOD confident that it was an EF-hand domain.  The big decision was between 
METHOD calmodulin-like and calbindin-like domains. 
METHOD  
METHOD In addition to using the specified domain, we tried a longer section 
METHOD of EP15_HUMAN that included the earlier EF-hand domain as well, in the 
METHOD hope that this would give us greater discrimination.  We also used an 
METHOD earlier (fourth) iteration of the target alignment building process 
METHOD than usual, since almost all the EF-hands were included in the 
METHOD alignment by the sixth iteration.  Since the calmodulin domains 
METHOD usually occur in pairs, this change made the calmodulins score better 
METHOD than the calbindins. 
METHOD  
METHOD The 2scpA library model scored the longer 2-domain sequence best 
METHOD (-28.36, in the region where we had fewer than 1% false positives in 
METHOD our superfamily fold-recognition tests).   
METHOD  
METHOD We then considered conservation patterns, the gaps,  and the 
METHOD insertions in various alignments to the calbindins and calmodulins. 
METHOD One alignment to 1osa had high residue conservation, but had a large 
METHOD insertion in the first binding pocket and a gap between the helices 
METHOD between the binding pockets.  One alignment to 5icb also had high 
METHOD residue identity, but needed 2 gaps on either side of the helix that 
METHOD distinguishes calbindins from calmodulins---making the calbindin 
METHOD hypothesis less likely. 
METHOD  
METHOD The first binding pocket probably does not bind calcium (it is 
METHOD disrupted in all our alignments), but the second pocket is highly 
METHOD conserved, so probably retains Ca-binding. 
METHOD  
METHOD We finally chose a 2scpA alignment that had an insertion in the first 
METHOD binding pocket and reasonably good conservation for the rest.  The 
METHOD placement of the first helix is somewhat arbitrary---the 2-residue 
METHOD insert could have been a 1-residue gap or a 6-residue insert just as 
METHOD easily.  
METHOD  
MODEL 1 
PARENT 2scp_A 
P 121 A 88 
W 122 K 89 
A 123 S 90 
V 124 V 91 
K 125 V 92 
P 126 E 93 
E 127 G 94 
D 128 P 95 
K 129 L 96 
A 130 P 97 
K 131 L 98 
Y 132 F 99 
D 133 F 100 
A 134 R 101 
I 135 A 102 
F 136 V 103 
D 137 D 104 
S 138 T 105 
L 139 N 106 
S 140 E 107 
N 143 D 108 
G 144 N 109 
F 145 N 110 
L 146 I 111 
S 147 S 112 
G 148 R 113 
D 149 D 114 
K 150 E 115 
V 151 Y 116 
K 152 G 117 
P 153 I 118 
V 154 F 119 
L 155 F 120 
L 156 G 121 
N 157 M 122 
S 158 L 123 
K 159 G 124 
L 160 L 125 
P 161 D 126 
V 162 K 127 
D 163 T 128 
I 164 M 129 
L 165 A 130 
G 166 P 131 
R 167 A 132 
V 168 S 133 
W 169 F 134 
E 170 D 135 
L 171 A 136 
S 172 I 137 
D 173 D 138 
I 174 T 139 
D 175 N 140 
H 176 N 141 
D 177 D 142 
G 178 G 143 
M 179 L 144 
L 180 L 145 
D 181 S 146 
R 182 L 147 
D 183 E 148 
E 184 E 149 
F 185 F 150 
A 186 V 151 
V 187 I 152 
A 188 A 153 
M 189 G 154 
F 190 S 155 
L 191 D 156 
V 192 F 157 
Y 193 F 158 
C 194 M 159 
A 195 N 160 
L 196 D 161 
E 197 G 162 
K 198 D 163 
E 199 S 164 
P 200 T 165 
V 201 N 166 
P 202 K 167 
M 203 V 168 
S 204 F 169 
L 205 W 170 
P 206 G 171 
P 207 P 172 
A 208 L 173 
L 209 V 174 
TER 
END 
