PFRMAT AL 
TARGET T0049 
AUTHOR 9070-5088-8627 
REMARK  
REMARK Prediction date: Monday June 22, 1998 
REMARK Group name: UCSC-compbio 
REMARK Authors: Christian Barrett, Melissa Cline, Mark Diekens, Kevin Karplus, 
REMARK 	 David Haussler and Richard Hughey 
REMARK University of California, Santa Cruz 
REMARK  
METHOD Overview 
METHOD  
METHOD Fold recognition was performed using the Target98 (SAM-T98) method 
METHOD [3] using SAM version 2.1.1 [1], a refinement of the methods developed 
METHOD by this group for CASP2 [2].  This method attempts to find and multiply  
METHOD align a set of homologs to a given sequence, then create an HMM from that  
METHOD multiple alignment. 
METHOD  
METHOD First, a set of sequence weights is determined from the alignment.  Next,  
METHOD Modelfromalign is used to build the model from the alignment and the  
METHOD sequence weights.  Finally, hmmscore performs a local, all-paths scoring  
METHOD of the sequences, using a reversed-sequence normalization feature. 
METHOD  
METHOD The weighting method, detailed in upcoming publications [3,4], 
METHOD combines the Henikoffs' scheme [5], Dirichlet mixtures [6], and an 
METHOD entropy method to set the final weights. 
METHOD  
METHOD Alignment generation 
METHOD  
METHOD The initial step uses BLASTP to search NRP twice: once to produce a set 
METHOD of very close homologs, and once to produce a set of possible homologs. 
METHOD  
METHOD The method then uses multiple iterations of a selection, training, and  
METHOD alignment procedure.  Each iteration involves an initial alignment, a set  
METHOD of search sequences, a threshold value, and a transition regularizer.  
METHOD  
METHOD The first iteration uses a single sequence (or seed alignment) as the  
METHOD initial alignment and the close homologs found by BLASTP are used as the  
METHOD search set.  The threshold is set very strictly, so that only good matches  
METHOD to the sequence are considered.  This iteration uses a transition regularizer  
METHOD that was designed to match the gap costs used by BLASTP. 
METHOD  
METHOD On subsequent iterations the input alignment is the output from the 
METHOD previous iteration, the search set is the larger set of possible 
METHOD homologs found by BLASTP, and the thresholds are gradually loosened. 
METHOD The second through second-from-last iteration use a ``long-match'' 
METHOD transition regularizer, and the final iteration uses a transition regularizer  
METHOD trained on FSSP alignments. 
METHOD  
METHOD References 
METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996. 
METHOD     http://www.cse.ucsc.edu/research/compbio/sam.html.   
METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. 
METHOD     Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and  
METHOD     Genetics, Suppl. 1, 134-9, 1997. 
METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06, 
METHOD     Department of Computer Engineering, Univ. of California, Santa Cruz, 1998. 
METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, 
METHOD     and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998. 
METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574-578, Nov 1994. 
METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S. 
METHOD    Mian, and D. Haussler, CABIOS 12(4):327-345, 1996. 
METHOD  
METHOD  
METHOD We got obvious homology to 3pte and 2bltA, with sum scores of -384.9 
METHOD and -220.27 (well into the region where we got no false positives in 
METHOD our tests).  In an evolutionary tree built from the t49.t98_6 
METHOD alignment, we found that 3pte was closer to t49 than 2bltA (despite 
METHOD the better score for 2bltA).  We restricted the alignment used to 
METHOD build the scoring HMM to just the closest homologs found in the second 
METHOD iteration of the alignment building, and then 3pte scored better than 
METHOD 2bltA then, so we decided to use 3pte as the template. 
METHOD  
METHOD The alignments did not all agree, so we took the global alignment 
METHOD using a model built from the 3pte alignment with its homologs, and 
METHOD hand-edited it to pick up the good pieces of alignment we saw in other 
METHOD possible alignments, and to move the gaps and insertions to the 
METHOD outermost parts of loops.  It turned out that several of the gaps and 
METHOD insertions clustered on the surface in 3-space, indicating that that 
METHOD portion of the molecule is probably not well conserved, and may be 
METHOD responsible for any change in function of the two proteins. 
METHOD  
METHOD The initial helix of T0049 probably starts somewhat sooner than the 
METHOD initial helix of 3pte, but there is no way to express that in the 
METHOD alignment format, and we don't yet have the tools to give atomic 
METHOD coordinates.  
MODEL 1 
PARENT 3pte 
A 19 D 8 
A 20 T 9 
R 21 G 10 
L 22 L 11 
D 23 Q 12 
A 24 A 13 
V 25 V 14 
F 26 L 15 
D 27 H 16 
Q 28 T 17 
A 29 A 18 
L 30 L 19 
R 31 S 20 
E 32 Q 21 
R 33 G 22 
L 35 A 23 
V 36 P 24 
G 37 G 25 
A 38 A 26 
V 39 M 27 
A 40 V 28 
I 41 R 29 
V 42 V 30 
A 43 D 31 
R 44 D 32 
H 45 N 33 
G 46 G 34 
E 47 T 35 
I 48 I 36 
L 49 H 37 
Y 50 Q 38 
R 52 L 39 
A 53 S 40 
Q 54 E 41 
G 55 G 42 
L 56 V 43 
A 57 A 44 
D 58 D 45 
R 59 R 46 
E 60 A 47 
A 61 T 48 
G 62 G 49 
R 63 R 50 
P 64 A 51 
M 65 I 52 
R 66 T 53 
E 67 T 54 
D 68 T 55 
T 69 D 56 
L 70 R 57 
F 71 F 58 
R 72 R 59 
L 73 V 60 
A 74 G 61 
S 75 S 62 
V 76 V 63 
T 77 T 64 
K 78 K 65 
P 79 S 66 
I 80 F 67 
V 81 S 68 
A 82 A 69 
L 83 V 70 
A 84 V 71 
V 85 L 72 
L 86 L 73 
R 87 Q 74 
L 88 L 75 
V 89 V 76 
A 90 D 77 
R 91 E 78 
G 92 G 79 
E 93 K 80 
L 94 L 81 
A 95 D 82 
L 96 L 83 
D 97 D 84 
A 98 A 85 
P 99 S 86 
V 100 V 87 
T 101 N 88 
R 102 T 89 
W 103 Y 90 
L 104 L 91 
P 105 P 92 
E 106 G 93 
F 107 L 94 
R 108 L 95 
P 109 P 96 
E 116 D 97 
P 117 D 98 
L 118 R 99 
V 119 I 100 
T 120 T 101 
I 121 V 102 
H 122 R 103 
H 123 Q 104 
L 124 V 105 
L 125 M 106 
T 126 S 107 
H 127 H 108 
T 128 R 109 
S 129 S 110 
G 130 G 111 
L 131 L 112 
G 132 D 114 
Y 133 Y 115 
W 134 T 116 
L 135 N 117 
L 136 D 118 
E 137 M 119 
G 138 F 120 
A 139 A 121 
G 140 Q 122 
S 141 T 123 
V 142 V 124 
Y 143 P 125 
D 144 G 126 
R 145 F 127 
L 146 E 128 
G 147 S 129 
I 148 V 130 
S 149 R 131 
D 150 N 132 
G 151 K 133 
R 155 V 134 
D 156 F 135 
F 157 S 136 
D 158 Y 137 
L 159 Q 138 
D 160 D 139 
E 161 L 140 
N 162 I 141 
L 163 T 142 
R 164 L 143 
R 165 S 144 
L 166 L 145 
A 167 K 146 
S 168 H 147 
A 169 G 148 
P 170 V 149 
S 172 T 150 
F 173 N 151 
A 174 A 152 
P 175 P 153 
G 176 G 154 
S 177 A 155 
G 178 A 156 
W 179 Y 157 
Q 180 S 158 
Y 181 Y 159 
S 182 S 160 
L 183 T 162 
A 184 N 163 
L 185 F 164 
D 186 V 165 
V 187 V 166 
L 188 A 167 
G 189 G 168 
A 190 M 169 
V 191 L 170 
V 192 I 171 
E 193 E 172 
R 194 K 173 
A 195 L 174 
T 196 T 175 
G 197 G 176 
Q 198 H 177 
P 199 S 178 
L 200 V 179 
A 201 A 180 
A 202 T 181 
A 203 E 182 
V 204 Y 183 
D 205 Q 184 
A 206 N 185 
L 207 R 186 
V 208 I 187 
A 209 F 188 
Q 210 T 189 
P 211 P 190 
L 212 L 191 
G 213 N 192 
M 214 L 193 
R 215 T 194 
D 216 D 195 
C 217 T 196 
G 218 F 197 
F 219 Y 198 
V 220 V 199 
S 221 T 203 
A 222 V 204 
E 223 I 205 
P 224 P 206 
E 225 G 207 
R 226 T 208 
F 227 H 209 
A 228 A 210 
V 229 N 211 
P 230 G 212 
Y 231 Y 213 
H 232 L 214 
D 233 T 215 
G 234 P 216 
Q 235 D 217 
P 236 E 218 
E 237 A 219 
P 238 G 220 
V 239 G 221 
R 240 A 222 
M 241 L 223 
R 242 V 224 
D 243 D 225 
G 244 S 226 
I 245 T 227 
E 246 E 228 
V 247 Q 229 
S 272 Q 235 
G 273 S 236 
G 274 A 237 
A 275 G 238 
G 276 A 239 
M 277 V 240 
Y 278 I 241 
G 279 S 242 
S 280 S 243 
A 281 T 244 
D 282 Q 245 
D 283 D 246 
V 284 L 247 
L 285 D 248 
R 286 T 249 
A 287 F 250 
L 288 F 251 
E 289 S 252 
A 290 A 253 
I 291 L 254 
R 292 M 255 
A 293 S 256 
N 294 G 257 
P 295 Q 258 
G 296 L 259 
F 297 M 260 
L 298 S 261 
P 299 A 262 
E 300 A 263 
T 301 Q 264 
L 302 L 265 
A 303 A 266 
D 304 Q 267 
A 305 M 268 
A 306 Q 269 
R 307 Q 270 
R 308 W 271 
D 309 T 272 
Q 310 T 273 
A 311 V 274 
G 312 N 275 
V 313 S 276 
G 321 T 277 
W 322 Q 278 
G 323 G 279 
F 324 Y 280 
G 325 G 281 
Y 326 L 282 
L 327 G 283 
S 328 L 284 
A 329 R 285 
V 330 R 286 
L 331 R 287 
D 332 D 288 
D 333 L 289 
P 334 S 290 
A 335 C 291 
A 336 G 292 
A 337 I 293 
G 338 S 294 
T 339 V 295 
P 340 Y 296 
Q 341 G 297 
H 342 H 298 
A 343 T 299 
G 344 G 300 
T 345 T 301 
L 346 V 302 
Q 347 Q 303 
W 348 G 304 
Y 352 Y 305 
G 353 Y 306 
H 354 T 307 
S 355 Y 308 
W 356 A 309 
F 357 F 310 
V 358 A 311 
D 359 S 312 
R 360 K 313 
A 361 D 314 
L 362 G 315 
G 363 K 316 
L 364 R 317 
S 365 S 318 
V 366 V 319 
L 367 T 320 
L 368 A 321 
L 369 L 322 
T 370 A 323 
N 371 N 324 
T 372 T 325 
A 373 S 326 
Y 374 N 327 
M 377 N 328 
S 378 V 329 
G 379 N 330 
P 380 V 331 
L 381 L 332 
T 382 N 333 
I 383 T 334 
A 384 M 335 
L 385 A 336 
R 386 R 337 
D 387 T 338 
A 388 L 339 
V 389 E 340 
Y 390 S 341 
A 391 A 342 
R 392 F 343 
TER 
END 
