PFRMAT AL 
TARGET T0049 
AUTHOR 5827-4749-3439 
REMARK HMM prediction for t0049  
METHOD Note: this alignment was generated automatically, and has not 
METHOD been hand-edited. 
METHOD 
METHOD This prediction was made as follows: 
METHOD  
METHOD 1. Homologs to the target were gathered using BLAST. This search  
METHOD resulted in two PDB structures being identified as homologous  
METHOD to the target 3pte (e-11) and 2bltA (e-06). 
METHOD  
METHOD 2. An HMM for close homologs (not including either structure) was  
METHOD created using UCSC SAM HMM software. 
METHOD  
METHOD 3. We scored PDB with this HMM. The structure 3pte had the  
METHOD highest score, followed by 2bltA. Because of the potential  
METHOD bias in the training sequences for the target HMM to favor one  
METHOD structure over the other, we also created an HMM for the target  
METHOD alone, using modelfromalign and the target sequence as an "alignment."    
METHOD When we scored PDB with this HMM, 3pte again had the highest score.  
METHOD We decided to examine each of these potential matches more closely,  
METHOD to determine which would be the closest of the two structures to the  
METHOD target. 
METHOD  
METHOD 4. When we scored t0049 against all the HMMs in our HMM library,  
METHOD these two structures again came to the top, with the HMM for 3pte  
METHOD having slightly higher scores. These HMMs were created by employing 
METHOD FASTA to identify homologous proteins, PRRP (Gotoh) to create a multiple 
METHOD alignment of the structure and homologs, and modelfromalign from 
METHOD the UCSC SAM software suite, to derive an HMM. We constructed 
METHOD HMMs for a non-redundant set of PDB at the 40% identity level 
METHOD (i.e., no two sequences in this set had more than 40% identity). 
METHOD  
METHOD 5. We then used Kimmen's Bayesian Evolutionary Tree Estimation software  
METHOD (Proceedings of ISMB98) to conduct a phylogenetic analysis of 3pte  
METHOD and 2bltA and associated homologs (as identified by FASTA search)  
METHOD separately. For this analysis, we used an alignment of the training  
METHOD sequences used to estimate each of the HMMs (using SAM align2model  
METHOD program to align the training sequences in each set to their HMM).   
METHOD We then built subfamily HMMs for each subfamily identified in the  
METHOD two families. The scores of t0049 against each subfamily-HMM were  
METHOD compared, with special attention paid to the subfamilies containing  
METHOD the structures, due to the overlap between the training sequences.  
METHOD These scores were close, although somewhat higher for the subfamily  
METHOD containing 3pte than for the subfamily containing 2bltA. 
METHOD  
METHOD 6. We analyzed the alignments for 3pte and homologs and 2bltA and  
METHOD homologs, to identify conservation patterns in the multiple alignment.  
METHOD  
METHOD 7. We aligned the target to each HMM separately, and examined the  
METHOD (Viterbi) alignment of the target to the structure and homologs,  
METHOD to determine whether the target aligned similar residues at apparently  
METHOD important positions in the structures. 
METHOD  
METHOD 8. These analyses supported a belief that the target was somewhat  
METHOD closer to 3pte than to 2bltA.  
METHOD  
METHOD 9. The alignment submitted is that of the target to the subfamily HMM  
METHOD built for 3pte itself (in fact, this subfamily contained only 3pte,  
METHOD and no other sequences, but the HMM was constructed to use information  
METHOD from all the sequences in the training data). 
METHOD Although we have begun the alignment with the first residue of 3pte,  
METHOD it is likely that the N-terminal 7-10 residues are not superimposable 
METHOD in three dimensions. 
MODEL  1 
PARENT 3pte 
L   6    A   1 
D   7    D   2 
P   8    L   3 
A  10    P   4 
F  11    A   5 
S  12    P   6 
L  13    D   7 
D  14    D   8 
A  15    T   9 
A  16    G  10 
L  22    L  11 
D  23    Q  12 
A  24    A  13 
V  25    V  14 
F  26    L  15 
D  27    H  16 
Q  28    T  17 
A  29    A  18 
L  30    L  19 
E  32    S  20 
R  33    Q  21 
R  34    G  22 
L  35    A  23 
V  36    P  24 
G  37    G  25 
A  38    A  26 
V  39    M  27 
A  40    V  28 
I  41    R  29 
V  42    V  30 
A  43    D  31 
R  44    D  32 
H  45    N  33 
G  46    G  34 
E  47    T  35 
I  48    I  36 
L  49    H  37 
R  51    Q  38 
R  52    L  39 
A  53    S  40 
Q  54    E  41 
G  55    G  42 
L  56    V  43 
A  57    A  44 
D  58    D  45 
R  59    R  46 
E  60    A  47 
A  61    T  48 
G  62    G  49 
R  63    R  50 
P  64    A  51 
M  65    I  52 
R  66    T  53 
E  67    T  54 
D  68    T  55 
T  69    D  56 
L  70    R  57 
F  71    F  58 
R  72    R  59 
L  73    V  60 
A  74    G  61 
S  75    S  62 
V  76    V  63 
T  77    T  64 
K  78    K  65 
P  79    S  66 
I  80    F  67 
V  81    S  68 
A  82    A  69 
L  83    V  70 
A  84    V  71 
V  85    L  72 
L  86    L  73 
R  87    Q  74 
L  88    L  75 
V  89    V  76 
A  90    D  77 
R  91    E  78 
G  92    G  79 
E  93    K  80 
L  94    L  81 
A  95    D  82 
L  96    L  83 
D  97    D  84 
A  98    A  85 
P  99    S  86 
V 100    V  87 
T 101    N  88 
R 102    T  89 
W 103    Y  90 
L 104    L  91 
P 105    P  92 
E 106    G  93 
F 107    L  94 
G 114    L  95 
S 115    P  96 
E 116    D  97 
P 117    D  98 
L 118    R  99 
V 119    I 100 
T 120    T 101 
I 121    V 102 
H 122    R 103 
H 123    Q 104 
L 124    V 105 
L 125    M 106 
T 126    S 107 
H 127    H 108 
T 128    R 109 
S 129    S 110 
G 130    G 111 
L 131    L 112 
G 132    Y 113 
Y 133    D 114 
W 134    Y 115 
L 135    T 116 
L 136    N 117 
E 137    D 118 
V 142    M 119 
Y 143    F 120 
D 144    A 121 
R 145    Q 122 
L 146    T 123 
G 147    V 124 
I 148    P 125 
S 149    G 126 
D 150    F 127 
G 151    E 128 
D 153    S 129 
L 154    V 130 
R 155    R 131 
D 156    N 132 
F 157    F 135 
D 158    S 136 
L 159    Y 137 
D 160    Q 138 
E 161    D 139 
N 162    L 140 
L 163    I 141 
R 164    T 142 
R 165    L 143 
L 166    S 144 
A 167    L 145 
S 168    K 146 
A 169    H 147 
P 170    G 148 
L 171    V 149 
S 172    T 150 
F 173    N 151 
A 174    A 152 
P 175    P 153 
G 176    G 154 
S 177    A 155 
G 178    A 156 
W 179    Y 157 
Q 180    S 158 
Y 181    Y 159 
S 182    S 160 
L 183    T 162 
A 184    N 163 
L 185    F 164 
D 186    V 165 
V 187    V 166 
L 188    A 167 
G 189    G 168 
A 190    M 169 
V 191    L 170 
V 192    I 171 
E 193    E 172 
R 194    K 173 
A 195    L 174 
T 196    T 175 
G 197    G 176 
Q 198    H 177 
P 199    S 178 
L 200    V 179 
A 201    A 180 
A 202    T 181 
A 203    E 182 
V 204    Y 183 
D 205    Q 184 
A 206    N 185 
L 207    R 186 
V 208    I 187 
A 209    F 188 
Q 210    T 189 
P 211    P 190 
L 212    L 191 
G 213    N 192 
M 214    L 193 
R 215    T 194 
D 216    D 195 
G 218    T 196 
F 219    F 197 
V 220    Y 198 
S 221    T 203 
A 222    V 204 
E 223    I 205 
P 224    P 206 
E 225    G 207 
R 226    T 208 
Y 231    H 209 
H 232    A 210 
D 233    N 211 
G 234    G 212 
Q 235    Y 213 
P 236    L 214 
E 237    T 215 
P 238    P 216 
V 239    D 217 
R 240    E 218 
M 241    A 219 
R 242    G 220 
D 243    G 221 
G 244    A 222 
I 245    Q 229 
E 246    T 230 
V 247    V 231 
P 248    S 232 
L 249    W 233 
P 250    A 234 
E 251    Q 235 
G 252    S 236 
H 253    A 237 
G 254    G 238 
A 255    A 239 
A 256    V 240 
V 257    I 241 
R 258    S 242 
F 259    S 243 
A 260    Q 245 
P 261    D 246 
S 262    L 247 
R 263    D 248 
V 264    T 249 
F 265    F 250 
E 266    F 251 
P 267    S 252 
G 268    A 253 
A 269    L 254 
Y 270    M 255 
A 275    S 256 
G 276    G 257 
M 277    Q 258 
Y 278    L 259 
G 279    M 260 
S 280    S 261 
A 281    A 262 
D 282    A 263 
D 283    Q 264 
V 284    L 265 
L 285    A 266 
R 286    Q 267 
A 287    M 268 
L 288    Q 269 
E 289    Q 270 
A 290    W 271 
I 291    T 272 
R 292    T 273 
A 293    V 274 
N 294    N 275 
E 300    S 276 
T 301    T 277 
L 302    Q 278 
G 321    G 279 
W 322    Y 280 
G 323    G 281 
F 324    L 282 
G 325    G 283 
Y 326    L 284 
L 327    R 285 
S 328    R 286 
A 329    R 287 
V 330    D 288 
L 331    L 289 
D 332    S 290 
D 333    C 291 
A 336    G 292 
A 337    I 293 
G 338    S 294 
T 339    V 295 
P 340    Y 296 
Q 341    G 297 
H 342    H 298 
A 343    T 299 
G 344    G 300 
T 345    T 301 
L 346    V 302 
Q 347    Q 303 
G 350    G 304 
V 351    Y 305 
Y 352    Y 306 
G 353    T 307 
H 354    Y 308 
S 355    A 309 
W 356    F 310 
F 357    A 311 
V 358    S 312 
D 359    K 313 
R 360    D 314 
A 361    G 315 
G 363    K 316 
L 364    R 317 
S 365    S 318 
V 366    V 319 
L 367    T 320 
L 368    A 321 
L 369    L 322 
T 370    A 323 
N 371    N 324 
T 372    T 325 
A 373    S 326 
Y 374    N 327 
E 375    N 328 
G 376    V 329 
M 377    N 330 
S 378    L 332 
G 379    N 333 
P 380    T 334 
L 381    M 335 
T 382    A 336 
I 383    R 337 
A 384    T 338 
L 385    L 339 
R 386    E 340 
D 387    S 341 
A 388    A 342 
V 389    F 343 
Y 390    C 344 
A 391    G 345 
R 392    K 346 
TER 
END 
