
PFRMAT AL
TARGET T0111
AUTHOR 3670-4530-6947
METHOD Overview
METHOD 
METHOD Fold recognition for this target was performed using the SAM-T99 and
METHOD SAM-T2K methods (which are similar to SAM_T98 [3]) using SAM version
METHOD 3.1 [1], a refinement of the methods developed by this group for CASP3
METHOD [7]).  These methods attempt to find and multiply align a set of
METHOD homologs to a given sequence, then create an HMM from that multiple
METHOD alignment.
METHOD 
METHOD First, a set of sequence weights is determined from the alignment.  Next, 
METHOD Modelfromalign is used to build the model from the alignment and the 
METHOD sequence weights.  Finally, hmmscore performs a local, all-paths scoring 
METHOD of the sequences, using a reversed-sequence normalization feature.
METHOD 
METHOD The weighting method, detailed in publications [3,4], uses Dirichlet
METHOD mixtures [6] to regularize the counts and an entropy method to set the
METHOD final weights.
METHOD 
METHOD We are currently using SAM-T2K to generate the HMM from the target
METHOD sequence, but are still using the library of SAM-T99 template HMMs,
METHOD since the SAM-T2K method is not yet fully stable.
METHOD 
METHOD Alignment generation
METHOD 
METHOD The initial step uses WU-BLASTP to search NRP to get nested sets of
METHOD possible homologs---from a set of very similar sequences to a set of
METHOD possibly related sequences.
METHOD 
METHOD The method then uses multiple iterations of a selection, training, and 
METHOD alignment procedure.  Each iteration involves an initial alignment, a set 
METHOD of search sequences, a threshold value, and a transition regularizer. 
METHOD 
METHOD The first iteration uses a single sequence (or seed alignment) as the
METHOD initial alignment and the most similar sequences found by BLASTP are
METHOD used as the search set.  The threshold is set very strictly, so that
METHOD only good matches to the sequence are considered.  
METHOD 
METHOD On subsequent iterations the input alignment is the output from the
METHOD previous iteration, the search set is a larger set of possible
METHOD homologs found by BLASTP, and the thresholds are gradually loosened.
METHOD 
METHOD The HMM used for scoring and aligning sequences is built from a
METHOD multiple alignment using the w0.5 script, which aims to get an average
METHOD of 0.5 bits of information per column of the alignment.
METHOD 
METHOD All PDB protein sequences (including, unfortunately, theoretical
METHOD models) are scored with the HMM built from the target alignment, and
METHOD all the template HMMs in our SAM-T99 library (about 3760) are used to
METHOD score the target sequence.
METHOD 
METHOD High-scoring hits in either direction are examined by hand, as are
METHOD other potential targets found by public servers (see the CAFASP
METHOD experiment) or by functional considerations.
METHOD 
METHOD The final alignment is selected from among the various alignments
METHOD obtained by varying whether the template model or the target model is
METHOD used for the alignment, whether local or global alignment is chosen,
METHOD and various other parameters.  In some cases parts of different
METHOD alignments are combined.
METHOD 
METHOD 
METHOD References
METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996.
METHOD     http://www.cse.ucsc.edu/research/compbio/sam.html.  
METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R.
METHOD     Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and 
METHOD     Genetics, Suppl. 1, 134-9, 1997.
METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06,
METHOD     Department of Computer Engineering, Univ. of California, Santa Cruz, 1998.
METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard,
METHOD     and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998.
METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574-578, Nov 1994.
METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S.
METHOD    Mian, and D. Haussler, CABIOS 12(4):327-345, 1996.
METHOD [7] Karplus, K; Barrett, C; Cline, M; Diekhans, M; Grate, L; Hughey, R. 
METHOD     Predicting protein structure using only sequence information.
METHOD     Proteins, 1999, Suppl 3:121-5.
METHOD 
METHOD 
METHOD 
METHOD T0111 is definitely a homology target with blast finding 17 excellent
METHOD hits.   
METHOD 
METHOD We know we have a dimer, so we restricted our attention to 
METHOD 1ebg[AB], 1ebh[AB], 1one[AB], 2one[AB].
METHOD 
METHOD The top scores for our HMMs were overwhelmingly for these chains, as
METHOD expected. We chose 2oneA as the template, since it was the newest and
METHOD used 1one for molecular replacement.
METHOD 
METHOD We ended up hand-editing the alignment of 2oneA from our 2track HMM to
METHOD move a mid-strand insert to the end of the strand, and unalign
METHOD residues around an insert and a deletion.  One insert was moved from
METHOD one end of a beta strand to the other.
METHOD 
METHOD The alignment was converted to 3D coordinates by using SCWRL and our
METHOD untested mini-threader "undertaker".  We provide two models:
METHOD one after extensive optimization with undertaker, and model 2 with
METHOD only minimal optimization.
METHOD 
METHOD In model 1, the extensive optimization has curled the final beta
METHOD strand away from its partner, which is probably incorrect---our score
METHOD function includes no terms for hydrogen bonds.
METHOD 
METHOD In model 2, the minimal modification has left the beta strand intact,
METHOD but has not resolved all the insertions and deletions.
METHOD 
METHOD Both models still have a few steric clashes.
MODEL 1
PARENT 2one_A
K 2 A 1
I 3 V 2
V 4 S 3
K 5 K 4
I 6 V 5
I 7 Y 6
G 8 A 7
R 9 R 8
E 10 S 9
I 11 V 10
I 12 Y 11
D 13 D 12
S 14 S 13
R 15 R 14
G 16 G 15
N 17 N 16
P 18 P 17
T 19 T 18
V 20 V 19
E 21 E 20
A 22 V 21
E 23 E 22
V 24 L 23
H 25 T 24
L 26 T 25
E 27 E 26
G 29 K 27
F 30 G 28
V 31 V 29
G 32 F 30
M 33 R 31
A 34 S 32
A 35 I 33
A 36 V 34
P 37 P 35
S 38 S 36
G 39 G 37
A 40 A 38
S 41 S 39
T 42 T 40
G 43 G 41
S 44 V 42
R 45 H 43
E 46 E 44
A 47 A 45
L 48 L 46
E 49 E 47
L 50 M 48
R 51 R 49
D 52 D 50
G 53 G 51
D 54 D 52
K 55 K 53
S 56 S 54
R 57 K 55
F 58 W 56
L 59 M 57
G 60 G 58
K 61 K 59
G 62 G 60
V 63 V 61
T 64 L 62
K 65 H 63
A 66 A 64
V 67 V 65
A 68 K 66
A 69 N 67
V 70 V 68
N 71 N 69
G 72 D 70
P 73 V 71
I 74 I 72
A 75 A 73
Q 76 P 74
A 77 A 75
L 78 F 76
I 79 V 77
G 80 N 80
K 81 I 81
D 82 D 82
A 83 V 83
K 84 K 84
D 85 D 85
Q 86 Q 86
A 87 K 87
G 88 A 88
I 89 V 89
D 90 D 90
K 91 D 91
I 92 F 92
M 93 L 93
I 94 I 94
D 95 S 95
L 96 L 96
D 97 D 97
G 98 G 98
T 99 T 99
E 100 A 100
N 101 N 101
K 102 K 102
S 103 S 103
K 104 K 104
F 105 L 105
G 106 G 106
A 107 A 107
N 108 N 108
A 109 A 109
I 110 I 110
L 111 L 111
A 112 G 112
V 113 V 113
S 114 S 114
L 115 L 115
A 116 A 116
N 117 A 117
A 118 S 118
K 119 R 119
A 120 A 120
A 121 A 121
A 122 A 122
A 123 A 123
A 124 E 124
K 125 K 125
G 126 N 126
M 127 V 127
P 128 P 128
L 129 L 129
Y 130 Y 130
E 131 K 131
H 132 H 132
I 133 L 133
A 134 A 134
E 135 D 135
L 136 L 136
N 137 S 137
G 138 K 138
T 139 S 139
P 140 K 140
G 141 T 141
K 142 P 143
Y 143 Y 144
S 144 V 145
M 145 L 146
P 146 P 147
V 147 V 148
P 148 P 149
M 149 F 150
M 150 L 151
N 151 N 152
I 152 V 153
I 153 L 154
N 154 N 155
G 155 G 156
G 156 G 157
E 157 S 158
H 158 H 159
A 159 A 160
D 160 G 161
N 161 G 162
N 162 A 163
V 163 L 164
D 164 A 165
I 165 L 166
Q 166 Q 167
E 167 E 168
F 168 F 169
M 169 M 170
I 170 I 171
Q 171 A 172
P 172 P 173
V 173 T 174
G 174 G 175
A 175 A 176
K 176 K 177
T 177 T 178
V 178 F 179
K 179 A 180
E 180 E 181
A 181 A 182
I 182 L 183
R 183 R 184
M 184 I 185
G 185 G 186
S 186 S 187
E 187 E 188
V 188 V 189
F 189 Y 190
H 190 H 191
H 191 N 192
L 192 L 193
A 193 K 194
K 194 S 195
V 195 L 196
L 196 T 197
K 197 K 198
A 198 K 199
K 199 R 200
G 200 Y 201
M 201 G 202
N 202 A 203
T 203 S 204
A 204 N 207
V 205 V 208
G 206 G 209
D 207 D 210
E 208 E 211
G 209 G 212
G 210 G 213
Y 211 V 214
A 212 A 215
P 213 P 216
N 214 N 217
L 215 I 218
G 216 Q 219
S 217 T 220
N 218 A 221
A 219 E 222
E 220 E 223
A 221 A 224
L 222 L 225
A 223 D 226
V 224 L 227
I 225 I 228
A 226 V 229
E 227 D 230
A 228 A 231
V 229 I 232
K 230 K 233
A 231 A 234
A 232 A 235
G 233 G 236
K 238 G 239
D 239 K 240
I 240 V 241
T 241 K 242
L 242 I 243
A 243 G 244
M 244 L 245
D 245 D 246
C 246 C 247
A 247 A 248
A 248 S 249
S 249 S 250
E 250 E 251
F 251 F 252
Y 252 F 253
K 253 K 254
D 254 D 255
G 255 G 256
K 256 K 257
Y 257 Y 258
V 258 D 259
L 259 L 260
K 265 K 271
A 266 W 272
F 267 L 273
T 268 T 274
S 269 G 275
E 270 P 276
E 271 Q 277
F 272 L 278
T 273 A 279
H 274 D 280
F 275 L 281
L 276 Y 282
E 277 H 283
E 278 S 284
L 279 L 285
T 280 M 286
K 281 K 287
Q 282 R 288
Y 283 Y 289
P 284 P 290
I 285 I 291
V 286 V 292
S 287 S 293
I 288 I 294
E 289 E 295
D 290 D 296
G 291 P 297
L 292 F 298
D 293 A 299
E 294 E 300
S 295 D 301
D 296 D 302
W 297 W 303
D 298 E 304
G 299 A 305
F 300 W 306
A 301 S 307
Y 302 H 308
Q 303 F 309
T 304 F 310
K 305 K 311
V 306 T 312
L 307 A 313
G 308 G 314
I 311 I 315
Q 312 Q 316
L 313 I 317
V 314 V 318
G 315 A 319
D 316 D 320
D 317 D 321
L 318 L 322
F 319 T 323
V 320 V 324
T 321 T 325
N 322 N 326
T 323 P 327
K 324 K 328
I 325 R 329
L 326 I 330
K 327 A 331
E 328 T 332
G 329 A 333
I 330 I 334
E 331 E 335
K 332 K 336
G 333 K 337
I 334 A 338
A 335 A 339
N 336 D 340
S 337 A 341
I 338 L 342
L 339 L 343
I 340 L 344
K 341 K 345
F 342 V 346
N 343 N 347
Q 344 Q 348
I 345 I 349
G 346 G 350
S 347 T 351
L 348 L 352
T 349 S 353
E 350 E 354
T 351 S 355
L 352 I 356
A 353 K 357
A 354 A 358
I 355 A 359
K 356 Q 360
M 357 D 361
A 358 S 362
K 359 F 363
D 360 A 364
A 361 A 365
G 362 G 366
Y 363 W 367
T 364 G 368
A 365 V 369
V 366 M 370
I 367 V 371
S 368 S 372
H 369 H 373
R 370 R 374
S 371 S 375
G 372 G 376
E 373 E 377
T 374 T 378
E 375 E 379
D 376 D 380
A 377 T 381
T 378 F 382
I 379 I 383
A 380 A 384
D 381 D 385
L 382 L 386
A 383 V 387
V 384 V 388
G 385 G 389
T 386 L 390
A 387 R 391
A 388 T 392
G 389 G 393
Q 390 Q 394
I 391 I 395
K 392 K 396
T 393 T 397
G 394 G 398
S 395 A 399
M 396 P 400
S 397 A 401
R 398 R 402
S 399 S 403
D 400 E 404
R 401 R 405
V 402 L 406
A 403 A 407
K 404 K 408
Y 405 L 409
N 406 N 410
Q 407 Q 411
L 408 L 412
I 409 L 413
R 410 R 414
I 411 I 415
E 412 E 416
E 413 E 417
A 414 E 418
L 415 L 419
G 416 G 420
E 417 D 421
K 418 N 422
A 419 A 423
P 420 V 424
Y 421 F 425
N 422 A 426
G 423 G 427
R 424 E 428
K 425 N 429
E 426 F 430
I 427 H 431
K 428 H 432
G 429 G 433
Q 430 K 435
A 431 L 436
TER
END

