
PFRMAT AL
TARGET T0094
AUTHOR 1020-4390-8741
REMARK Prediction team GERLOFF consists of the following
REMARK team members: Dietlind L. Gerloff,
REMARK Cairan Duffy, Zeti A. M. Hussein, 
REMARK Siu-wai Leung, Gina M. Cannarozzi.
REMARK
REMARK Prediction for T0094 was by: DLG, ZAMH.
METHOD
METHOD This submission is a TENTATIVE MANUAL THREADING ALIGNMENT for
METHOD CPDase from A.thaliana (T0094) assuming that the weak sequence
METHOD similarity to HISTONE ACETYLTRANSFERASE (1ygh), as detected by
METHOD pdb-blast (CAFASP server #1), bears any significance. 
METHOD The difficulties with building the model under this assumption 
METHOD might indicate that the assumption is valid; therefore, we 
METHOD assign only MEDIUM-TO-LOW CONFIDENCE to this submission.
METHOD
METHOD WE SUBMIT THIS MANUAL THREADING ALIGNMENT WITH THE 
METHOD INTENTION TO ASSIST IN IMPROVING THE QUALITY OF AUTOMATED
METHOD THREADING ALIGNMENTS BY UNCOVERING WHICH OF THE CLUES COMMONLY
METHOD USED BY EXPERTS ALIGNING MANUALLY ARE MOST USEFUL AND COULD BE
METHOD EXPLORED SYSTEMATICALLY.
METHOD 
METHOD In this case, the clues for anchoring the alignment came from
METHOD predicted secondary structures, the requirement that two di-
METHOD sulfide bridges can form, and the 3-dimensional closeness of
METHOD putative active site residues. Details are given below.
METHOD
METHOD (0), multiple sequence alignment. Finding homologs by sequence
METHOD similarity was difficult; only one homolog (A.thaliana CPDase-
METHOD like) was retrieved from the non-redundant databases. Searching
METHOD complete and unfinished genome sequences at the NCBI (Microbial
METHOD genomes BLAST) and PEDANT (MIPS) sites yielded several candidates
METHOD but most were rejected as questionable. One exception was an ORF
METHOD detected in Geobacter via NCBI, where the alignment seemed to 
METHOD highlight two stretches of highly conserved sequence. When this
METHOD sequence was aligned with the A.thaliana sequences, the aa types
METHOD of the few conserved residues overall seemed appropriate for 
METHOD constituting an enzymatic active site for a CPDase. Our prediction
METHOD is based on this alignment of three sequences; it is likely to
METHOD break down if the assumption of homology between the geobacter
METHOD sequence and T0094 proved to be artefactual.
METHOD 
METHOD (a), secondary structure predictions. The server predictions
METHOD made available through CAFASP2 differed in several elements.
METHOD Closer inspection indicated that some methods might have included
METHOD sequences in their multiple sequence alignments that might not be 
METHOD true homologs. The secondary structure prediction that seemed the
METHOD most compatible with the 1ygh structure, and thus the most suitable
METHOD for this exercise, was that by the Pred2ary server at UCSF
METHOD (J-M Chandonia, CAFASP server #7).  
METHOD 
METHOD (b), requirement for 2 disulfide bridges. Conservation of only two
METHOD Cys in the A.thaliana CPDase homolog indicated that one of the 
METHOD bridges should be formed between Cys64 and Cys86. This requirement
METHOD was not easy to accommodate and we resorted to shifting slightly
METHOD the sequence-based alignment (pdb-blast) in the region #84-91 so 
METHOD that Cys86 would point into the right direction. No other Cys 
METHOD were conserved. Based on the model, the second disulfide bridge
METHOD would be between Cys110 and Cys177; there would seem no way to
METHOD connect Cys104 and Cys159.
METHOD 
METHOD (c), putative active site. Our present model brings together most
METHOD of the conserved functional residues in the three-sequence align-
METHOD ment. Specifically, if the model were correct, His119, Ser121, L123,
METHOD Lys132, and Trp171 would be well-positioned to participate in the
METHOD enzymatic function. Further, we would like to think that the other
METHOD short conserved sequence between the Geobacter and Arabidopsis 
METHOD sequences, particularly His42 and Thr44, could be brought near
METHOD the same site; this sequence would be in an insertion to the 1ygh
METHOD fold and was not modelled. Maybe coincidentally, a similar conserved
METHOD region was also found in the C-terminal domain of the tRNA-ligase
METHOD sequences which are known to be responsiple for CPDase activity in
METHOD these proteins (according to SWISSPROT entries and literature).
METHOD
METHOD One of the indications in disfavour of our model, and of the weak
METHOD sequence similarity between T0094 and 1ygh suggested by pdb-blast
METHOD is that the CPDase active site would be located in a different
METHOD vicinity of the common fold than that of the known Acetyltransferases.
METHOD Further, the similarity between the two conserved regions (PHVTV...
METHOD and PHLSL...) might just as well be accommodated by placing the
METHOD putative active site on two neighboring strands. This does not
METHOD seem possible if 1ygh is used as the parent fold. Together, these
METHOD are reasons for us to remain skeptic, overall, about this model, 
METHOD and the underlying assumptions.
METHOD 
MODEL  1
PARENT 1ygh_A
K   6  I 100
D   7  E 101
V   8  F 102
Y   9  R 103
S  10  V 104
V  11  V 105
W  12  N 106
A  13  N 107
L  14  D 108
P  15  N 109
D  16  T 110
E  17  K 111
E  18  E 112
S  19  N 113
E  20  M 114
P  21  M 115
R  22  V 116
F  23  L 117
K  24  T 118
K  25  G 119
L  26  L 120
M  27  K 121
E  28  N 122
A  29  I 123
L  30  F 124
R  31  Q 125
S  32  K 126
E  33  Q 127
F  34  L 128
T  35  P 129
A  56  P 132
K  57  K 133
K  58  E 134
M  59  Y 135
F  60  I 136
E  61  A 137
S  62  R 138
A  63  L 139
C  64  V 140
D  65  Y 141
G  66  D 142
L  67  H 145
K  68  L 146
A  69  S 147
Y  70  M 148
T  71  A 149
A  72  V 150
D  75  V 158
R  76  G 159
V  77  G 160
S  78  I 161
T  79  T 162
G  80  Y 163
T  81  R 164
F  84  E 173
Q  85  I 174
C  86  V 175
V  87  F 176
F  88  C 177
L  89  A 178
L  90  I 179
L  91  S 180
A 100  Y 188
G 101  G 189
E 102  A 190
H 103  H 191
C 104  L 192
K 105  M 193
N 106  N 194
H 107  H 195
F 108  L 196
N 109  K 197
C 110  D 198
S 111  Y 199
T 112  V 200
T 113  R 201
T 114  N 202
P 115  T 203
Y 116  S 204
M 117  N 205
P 118  I 206
H 119  K 207
L 120  Y 208
S 121  F 209
L 122  L 210
L 123  T 211
Y 124  Y 212
A 125  A 213
E 126  D 214
L 127  N 215
T 128  Y 216
E 129  A 217
E 130  I 218
E 131  G 219
K 132  Y 220
K 133  F 221
N 134  K 222
A 135  G 225
Q 136  F 226
E 137  T 227
K 138  K 228
A 139  E 229
Y 140  I 230
T 141  T 231
L 142  L 232
D 143  D 233
S 144  K 234
S 145  S 235
L 146  I 236
D 147  W 237
G 148  M 238
L 149  G 239
S 150  Y 240
E 164  E 245
D 165  G 246
K 166  G 247
T 167  T 248
L 168  L 249
E 169  M 250
T 170  Q 251
W 171  C 252
E 172  S 253
T 173  M 254
V 174  L 255
A 175  P 256
V 176  R 257
C 177  I 258
N 178  R 259
L 179  Y 260
TER
END





