PFRMAT SS 
TARGET T0048 
AUTHOR 6844-7080-1795 
REMARK DOUBLET CODE METHOD - DOUC METHOD 
REMARK ___________________________________________________________ 
REMARK 
REMARK              M O D E L           O R D E R 
REMARK 
REMARK M u l t i p l e     S e q u e n c e     P r e d i c t i o n 
REMARK 
REMARK without           M1   complete case   M2  with 
REMARK strong/weak       M5  homologous case      strong/weak 
REMARK codons            _______________________  codons 
REMARK differentiation   M3                   M4  differentiation 
REMARK 
REMARK S i  n  g  l  e     S e q u e n c e     P r e d i c t i o n 
REMARK ___________________________________________________________ 
REMARK M5 is the the label for models developed and checked during 
REMARK the CASP3 and from 12 AUGUST it is an intermediate between 
REMARK M3 and M1 and is homologous case of M1 (secondary structures 
REMARK very similar to target one are used in the prediction 
REMARK alignment). This version of M5 is used for all the targets 
REMARK except 43, 44, 49, 52, 54, 63, 73 for which previous version, 
REMARK complete case M5 was used. 
REMARK ____________________________________________________________ 
REMARK  SUGGESTED ORDER   OF MODELS ADEQUITY   TO NATIVE STRUCTURE 
REMARK     MSP   [  M1  M2   >   M5 ]   >   [  M3  M4  ]   SSP 
REMARK   TO   BE   CHECKED    DURING   THE   CASP3   EXPERIMENT 
REMARK ____________________________________________________________ 
REMARK Success of the multiple sequence prediction models 1, 2 
REMARK depends on the alignment completness and correctness. 
REMARK It is suggested and to be checked during the CASP3 that from 
REMARK complete and correct alignment arises native structure. 
REMARK ____________________________________________________________ 
METHOD     D O U B L E T       C O D E       M E T H O D 
METHOD Shestopalov Boris V.      Russian Academy of Sciences 
METHOD Sanct-Peterburg 194064    Institute of Cytology 
METHOD DOUblet Code Method   -   DOUC Method 
REMARK ____________________________________________________________ 
METHOD INTRO-DOUC-TION.The method is manual. It takes only 
METHOD 30 minutes to perform manual single sequence 
METHOD pre-DOUC-tion of 300 residue protein. The basis of 
METHOD the method was published in 1990 (Shestopalov B.V. 
METHOD Prediction of protein secondary structure by doublet 
METHOD code method. Mol. Biol., Moscow, Engl. transl., 24/4, 
METHOD p.900-907). For the CASP3 the method has been modified. 
METHOD ------------------------------------------------------------- 
METHOD DOUC-SCRIPTION. Coils, strands, helices consist 
METHOD of overlaps of structurons which consist of 2, 3, 5, 
METHOD residues and are encoded by residue pairs (i, i+1), 
METHOD (i, i+2), (i, i+4) respectively. Codon tables are obtained 
METHOD from analysis of residue pairs occurence in secondary 
METHOD structures. Codon distributions in a primary structure 
METHOD are placed in three lines under the structure. 
METHOD Usually codons of diiferent structural types overlap 
METHOD in an amino acid sequence. Choice of codons in such 
METHOD cases is necessary. The choice is to exclude the least 
METHOD number of codons until the overlap disappear. 
METHOD Obtained codon distributions are used for prediction. 
METHOD If several variants of distributions are obtained 
METHOD the prediction of some regions may be ambiguous 
METHOD and such regions can not be predicted at this stage. 
METHOD The average prediction accuracy of this procedure, 
METHOD so called single sequence prediction (SSP), is limited 
METHOD up to 63% because only local interactions are 
METHOD considered. If one uses similar sequences with such 
METHOD similar secondary structures predicted which contain 
METHOD as much as possible information about native secondary 
METHOD structure, the average secondary structure from their 
METHOD alignment may be nearer to the experimental one up to 
METHOD 5-10% and ambiguities are excluded. This is version 
METHOD of so called multiple sequence prediction (MSP) used here. 
REMARK __________________________________________________________ 
METHOD DOUC-TAILS. The codons are classified as strong 
METHOD and weak ones. A residue pair is strong (weak) 
METHOD codon if probability of respective structure 
METHOD for this pair is more (equal) than probability 
METHOD of total of two other structures. The probability 
METHOD is calculated from an occurence of the pair 
METHOD in a secondary structure database using the reverse 
METHOD binomial distribution (2P-1=0.999). 
METHOD The codon choice is performed firstly between strong 
METHOD codons. Then weak codons are considered. 
METHOD The secondary structure database was constructed 
METHOD firstly from primary and secondary structures 
METHOD of 257 proteins. Then secondary structure of these 
METHOD proteins was predicted by the code obtained 
METHOD from this database. Then new database was constructed 
METHOD from primary and secondary structures of correctly 
METHOD predicted regions and new code was obtained from 
METHOD this database. New code was used for new proteins 
METHOD secondary structure prediction and correctly predicted 
METHOD regions were added to the database and new code was 
METHOD obtained from the enlarged database and new proteins 
METHOD were predicted and new database was constructed... 
METHOD The DOUC-CODONS used for the CASP3 target prediction 
METHOD is remarked after the method. 
REMARK To this moment the DOUBLET CODE is ready up to 95-97%. 
REMARK Probably most of the weak codons with rare residues (W, 
REMARK C, H, M) may become strong ones or disappear. Therefore 
REMARK the version of DOUC method without differentiation 
REMARK between strong and weak codons is used also. 
METHOD Five models are used. Models 1, 2, 5 (3, 4) are 
METHOD MSP (SSP). M5 is the label for models developed and checked 
METHOD during the CASP3. Secondary structures for the alignments 
METHOD are selected from ones obtained by pre-DOUC-tions 
METHOD for sequences selected as mentioned before the model line. 
METHOD All the structures which are similar to the target 
METHOD protein one not less than 60%, COMPLETE CASE 
METHOD (models 1, 2), and 80%, HOMOLOGOUS CASE (model 5), 
METHOD are used for secondary structure alignment. Models 1 
METHOD and 3 are ones without strong/weak differentiation, 
METHOD Model 5 is modified lastly on 12 August 1998 and this new 
METHOD version IS NOT USED for targets 43, 44, 49, 52, 54, 63, 73 
METHOD and will be used for all targets except mentioned above. 
METHOD The final version of models 1, 2 is used for all targets 
METHOD except targets 43, 52, 54 for which more restrictive 
METHOD version was used (similarity of the aligned secondary 
METHOD structuresto the target one is 70% then and 60% now) 
METHOD but this modification of M1 and M2 is not principal. 
REMARK ____________________________________________________________ 
REMARK DOUC-CODONS-07.06.98. The database volume is 150000 
REMARK amino acid residues. COIL. Strong codons: AD AG AP CG 
REMARK CN CP DC DD DG DH DK DN DP DS DT DW ED EG EP ES FP GA 
REMARK GC GD GE GG GH GK GL GM GN GP GQ GR GS GT GW HD HG HN 
REMARK HP HS IP KD KG KN KP KS LP MG MP ND NG NH NK NN NP NR 
REMARK NS NT PA PC PD PE PF PG PH PK PL PM PN PP PQ PR PS PT 
REMARK PW PY QG QP RG RP SD SG SH SK SN SP SQ SR SS ST TD TG 
REMARK TK TN TP TS VP WD WG WP YP.  Weak codons: CD CH DQ DR 
REMARK EN HH HK KC MN MS NC NQ NW NY QD QN QS RN RS SC SW WC 
REMARK WN WS. STRAND. Strong codons: CI CV FC FF FI FL FT FV 
REMARK FW FY HV IC IF II IL IT IV IW IY LC LF LI LL LV LY MF 
REMARK MV TF TH TI TT TV TY VC VF VI VL VS VT VV VW VY WF WI 
REMARK WV WY YC YF YI YL YT YV YY.  Weak codons: CC CF CL CM 
REMARK CT CW CY FM HC HF HH HI HW HY IM LW MC MI MW MY VH VM 
REMARK WC WH WL WW YH YW. HELIX.  Strong codons: AA AE AK AL 
REMARK AM AQ AR EA EE EK EL EM EQ ER IL IM KA KE KQ LA LL LM 
REMARK LQ LR MA ME ML MR QA QD QE QK QQ QR RA RE RK RM RQ RR 
REMARK Weak codons:   AH AW CM DR EH EW FL FM HE HH HM KM KR 
REMARK LI MH MI MK MM MQ MW QM RW WI WK WL WM WW YM. 
REMARK 127 COIL CODONS: 103 strong, 24 weak; 81 STRAND CODONS: 
REMARK 53/28; 68 HELIX CODONS: 40/28. 276 IN TOTALITY: 196/80. 
REMARK ________________________________________________________ 
REMARK DOUC-LIST OF MSP SEQUENCES ADDED TO TARGET ONE: 
REMARK NCBI 2134181, 2982796, 1421379, 1652924, 2664238. 
REMARK _________________________________________________________ 
MODEL  2 
M C 0.00 
T C 1.00 
A C 1.00 
L C 1.00 
T C 1.00 
Q C 0.00 
A H 1.00 
H H 1.00 
C H 1.00 
E H 1.00 
A H 1.00 
C H 1.00 
R H 1.00 
A H 1.00 
D H 1.00 
A H 1.00 
P C 1.00 
H C 1.00 
V C 1.00 
S C 1.00 
D C 0.00 
E H 1.00 
E H 1.00 
L H 1.00 
P H 1.00 
V H 1.00 
L H 1.00 
L H 1.00 
R H 0.00 
Q C 0.00 
I C 1.00 
P C 1.00 
D C 1.00 
W C 1.00 
N C 1.00 
I C 0.00 
E C 1.00 
V C 1.00 
R C 1.00 
D C 1.00 
G C 0.00 
I C 0.00 
M C 0.00 
Q C 0.00 
L C 0.00 
E C 0.00 
K C 0.00 
V E 1.00 
Y E 1.00 
L E 1.00 
F E 1.00 
K C 1.00 
N C 1.00 
F C 1.00 
K C 1.00 
H C 1.00 
A H 1.00 
L H 1.00 
A H 1.00 
F H 1.00 
T H 1.00 
N H 1.00 
A H 1.00 
V H 1.00 
G H 1.00 
E H 1.00 
I H 1.00 
S H 1.00 
E H 1.00 
A H 1.00 
E H 0.00 
G C 1.00 
H C 1.00 
H C 1.00 
P C 1.00 
G C 1.00 
L C 1.00 
L E 1.00 
T E 1.00 
E E 1.00 
W C 0.00 
G C 1.00 
K C 1.00 
V E 1.00 
T E 1.00 
V E 1.00 
T E 1.00 
W E 1.00 
W E 1.00 
S E 1.00 
H C 0.00 
S C 1.00 
I C 1.00 
K C 1.00 
G C 1.00 
L C 1.00 
H C 0.00 
R C 1.00 
N C 1.00 
D C 1.00 
F E 1.00 
I E 1.00 
M E 1.00 
A H 1.00 
A H 1.00 
R H 1.00 
T H 1.00 
D H 1.00 
E H 1.00 
V H 0.00 
A H 0.00 
K H 0.00 
T H 0.00 
A H 0.00 
E H 0.00 
G C 0.00 
R C 0.00 
K C 0.00 
END 
