TS # 3D atomic coordinates (Tertiary Structure) prediction SS # Secondary Structure prediction RR # Residue-Residue separation distance prediction AL # Format to express unambiguous ALignments to PDB entries
Examples:
Prediction code T0045SS067_1 has the following components:
T0045 target number
SS Secondary Structure prediction (PFRMAT SS)
067 prediction group 67
1 model index 1 (see MODEL record)
Prediction code T0044TS005_2 has the following components:
T0044 target number
TS Tertiary Structure (3D atoms coordinates, PFRMAT TS)
005 prediction group 5
2 model index 2 (by default considered as FINAL/REFINED)
Prediction code T0044TS005_2u has the following components:
T0044 target number
TS Tertiary Structure (3D atoms coordinates, PFRMAT TS)
005 prediction group 5
2u model index 2 UNREFINED set of coordinates
PFRMAT Format specification # TS , SS , RR or AL TARGET Target identifier from the CASP3 target table AUTHOR XXXX-XXXX-XXXX # The registration code of the Group Leader REMARK Comment record (may appear anywhere, optional) METHOD Records describing the methods used MODEL Beginning of the data section for the submitted model PARENT Specifies structure template used to generate the TS/AL model TER Terminates independent segments of structure in the TS/AL model END End of the submitted model
Record PFRMAT is used for all submissions.
PFRMAT TS
TS indicates that the submission contains 3D atomic coordinates
in standard PDB format
PFRMAT SS
SS indicates that the submission contains secondary structure
prediction
PFRMAT RR
RR indicates that the submission contains residue-residue
separation distance prediction
PFRMAT AL
AL indicates that the submission contains unambiguous alignments
to PDB entries
Record TARGET is used for all submissions.
TARGET Txxxx
Txxxx indicates the id of the target predicted.
Targets from the CASP3 target table are valid.
Record AUTHOR is used for all submissions.
AUTHOR XXXX-XXXX-XXXX
XXXX-XXXX-XXXX indicates the Group Leader's registration code.
This code is the prediction submission code obtained upon
registration at the CASP3 WEB sites (Prediction Center).
Members of prediction groups who intend to submit predictions
should use the registration code of the Group Leader for all
predictions submitted by that group.
REMARK Optional. PDB style 'REMARK' records may be used
anywhere in the submission. These records may contain any
text and will in general not influence evaluation.
Records METHOD are used for all submissions.
These records describe the methods used. Predictors are urged to provide
as full a description of the methods as possible, including references,
data libraries used, and values of non-default parameters.
These descriptions will be made available via the Prediction Center WEB
pages as well as printed along with the other materials distributed at the
meeting. Length of 100 - 500 words is suggested.
Record MODEL is used for all submissions.
Signifies the beginning of model data (3D atomic coordinates, an unambiguous
alignment to a PDB entry, residue-residue separation distance prediction,
or secondary structure prediction).
MODEL n [REFINED|UNREFINED]
n Model index n is used to indicate predictor's ranking
according to her/his belief which model is closest to the
target structure (1 <= n <= 5). Model index is included
automatically in the ACCESSION CODE.
REFINED The set of coordinates labeled REFINED will be considered
as a final model (to allow the evaluation of the results
of an automated refinement process, such as molecular
dynamics). Models submitted without any label: REFINED or
UNREFINED will be considered by default as final.
UNREFINED Coordinates labeled UNREFINED will be compared only to
the final set (REFINED) with the same model index n, to
evaluate the effectiveness of the refinement method. If
UNREFINED model is submitted, a REFINED model must be
submitted as well. The letter "u" will be added to the
model index in the ACCESSION CODE of the UNREFINED model.
Record PARENT is used for all submissions in the TS (and AL)
format.
PARENT record indicates structure templates used to generate any independent
segment of MODEL (see description of the TS format below).
The PARENT record should be placed as the first record of any such independent
segment. Only one PARENT record per structure segment is allowed.
PARENT N/A
Indicates an ab initio prediction, not directly based on any known
structure. Note that this is the only indication in the file that the
prediction is ab initio, so is a critical piece of information.
PARENT NONE [n1 n2]
Indicates that the predictor believes that there is no structure in
the present PDB that is close enough to be used as a template. This
is an entry requested by those predictors who use threading and
sequence comparison methods. With structural genomics projects being
designed to determine the structure of proteins with novel folds, the
ability to predict when a fold is unknown is becoming increasingly
important, and predictors are urged to make such submissions.
Delimiters n1 n2 indicate the range of the target sequence predicted
as having no homologue in the current PDB.
Omission of n1 n2 indicates the entire target (see Example 1 (C)).
PARENT mabc_A
Indicates that the model or the independent segment of structure is
based on a single PDB entry mabc chain A (use _A to indicate chain A).
Most threading and sequence search submissions would now be submitted
with this form of the PARENT record. A comparative modeler using a
single parent structure would also use this form. Note that, in order
to be accepted, the code must correspond to a current PDB entry.
PARENT mcdc ndef_g [ohij_k ...]
Is used only in comparative modeling and indicates that the model is
based on more than one structure template. Up to five PDB chains
may be listed here with additional detailed information included in
the METHOD records. In threading and sequence search, subdomains of
the target structure found to correspond to different known folds
should be submitted as independent segments of structure with
reference to only one PDB chain per segment.
Record TER is used to terminate an independent segment of structure
(PFRMAT TS and PFRMAT AL).
TER
3D atomic coordinates (PFRMAT TS).
Standard PDB atom records are used for the atomic coordinates.
Coordinates for each model or an independent structure segment should begin
with a single PARENT record and terminate with a TER record (see above).
It is requested that coordinate data be supplied for at least all non-hydrogen main chain atoms, i.e. the N, CA, C and O atoms of every residue. Specifically, if only CA atoms are predicted by the method, predictors are encouraged to build the main chain atoms for every residue before submission to CASP. One program that can make such a conversion is Maxsprout server of Liisa Holm and co-workers. (If only CA atoms were submitted it would not be possible to run most of the analysis software, which would severely limit the evaluation of that prediction.) When multiple independent segments of structure are used in a prediction, they will be evaluated separately with no assumption of a common frame of reference between the segments. For any given MODEL, no target residue may be repeated among all such independent structure segments. In comparative modeling and in threading, potential multi-domain nature of targets will be addressed in the evaluation even if the prediction is made in a single frame of reference (i.e. without separation into multiple segments of structure). For such predictions segmentation should only be used to allow multiple model predictions (effectively up to 5 predictions for each such domain).
Notes:
- atoms for which a prediction has been made must contain "1.0" in
the occupancy field; those for which no prediction is made must
either contain "0.0" in that field or be skipped altogether
- error estimates, in Angstroms, must be provided in the temperature
factor field
An unambiguous alignment to a PDB entry used for threading predictions
(PFRMAT AL).
Alignment for each model or an independent structure segment should begin
with a single PARENT record and terminate with a TER record (see above).
The (four column) alignment data records provide: target residue one
letter symbol, target residue sequence number, PDB residue one letter symbol,
and PDB residue sequence number with an insertion code if necessary
(see Example 4):
aa1 n1 aa2 n2
Note:
- residues for which no prediction is made must be skipped
- if a chain ID is specified in the PDB template of the target, then
the target residue sequence number should be composed of a chain ID
and residue number, e.g. A2, B44
The PDB code with chain extension of the structure the alignment is based on
should be placed in the PARENT record.
Only one PDB code per independent structure segment is allowed.
PDB codes should refer to structures containing at least main chain atomic
coordinates (see TS format).
As in the coordinate submissions,
when multiple independent segments of structure are used in a prediction,
they will be evaluated separately with no assumption of a common
frame of reference between the segments. For any given MODEL, no target
residue may be repeated among all such independent structure
segments. Potential multi-domain
nature of targets will be addressed in the evaluation even if the prediction
is made in a single frame of reference (i.e. without separation into multiple
segments of structure). For such predictions segmentation should only be
used to allow multiple model predictions (effectively up to 5 predictions
for each such domain).
The facility to translate sequence - structure alignments (AL format) into
standard PDB atom records (TS format) is available as an
additional service.
Secondary structure prediction (PFRMAT SS).
Data in this format is inserted between MODEL and END
records of the submission file.
The (three column) format record consists of residue code, secondary structure
assignment code, and a number specifying the associated confidence level:
aa ss pThe symbols for the 3 state secondary structure are 'H'=helix, 'E'=strand, 'C'=Coil. Confidence level is a probability of a residue being predicted correctly with values in the range of 0.0 - 1.0. The entire sequence of the target should always be given. If parts cannot be predicted a probability of 0.0 should be used.
Residue-Residue separation prediction (PFRMAT RR).
Data in this format is inserted between MODEL and END records of the
submission file.
Format for the predicted separation distance between pairs of residues.
The distance is defined as the separation between C-beta atoms (C-alpha for
glycine residues).
i j d1 d2 p
Notes (see Example 3):
- entire target sequence should be split over multiple lines with a
maximum of 50 residues per line
- for intrachain residue-residue contacts residue number indices
i and j should be used for distance specification (i < j), i.e.
only one diagonal of the separation matrix should be supplied
- the distances d1 and d2 (real numbers) should indicate the range of
Cb-Cb distance predicted for the residue pair (C-alpha for glycines)
- the real number p should range from 0.0 - 1.0 to indicate
probability of the distance falling between the predicted range
- residue 'contacts' (defined here - as in CASP2 - as Cb-Cb<8A) can be
predicted with this format as:
i j 0 8 p
- any pair NOT listed is assumed to be NOT considered by predictor
- to evaluate the subset of residue-residue separation distances that
represent 'contacts', 4 separation interval bins will be used (as
in CASP2) (separation is calculated along the chain as a number of
residues between the residues in contact):
1 residue or more : 1-9999
from 1 to 4 residues : 1-4
from 5 to 8 residues : 5-8
9 residues or more : 9-9999
Example: Let's assume that two residues are in contact and there
are 6 residues in between them along the chain. This contact will
be classified as belonging to 5-8 separation interval bin and will
not be counted in 1-4 and 9-9999 bins
- in addition, in the evaluation of each prediction, 'p' will be
compared to what would be expected from random, i.e. the likelyhood
observed in the database of protein structures for a pair of
residues with residue separation (distance) d1-d2; residue
separation (sequence) j-i; protein size; types of residue i, j.
END record is used for all predictions and indicates the end of a
single model submission.
Predictions of multichain targets.
Atomic coordinates should contain chain IDs as provided in template files.
In residue-residue contact predictions residue
indices should be composed of chain ID and residue number, e.g. A2, B44
(see Example 5).
(A) An example of comparative modeling prediction. As this model is labeled UNREFINED, submission of a REFINED model is also required.
PFRMAT TS TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 UNREFINED PARENT 1abc 1def_A ATOM 1 N GLU 1 10.982 -9.774 1.377 1.00 0.50 ATOM 2 CA GLU 1 9.623 -9.833 1.984 1.00 0.50 ATOM 3 C GLU 1 8.913 -11.104 1.521 1.00 0.50 ATOM 4 O GLU 1 9.187 -11.630 0.461 1.00 0.50 ATOM 5 CB GLU 1 8.814 -8.614 1.546 1.00 0.50 ATOM 6 CG GLU 1 7.372 -8.754 2.039 1.00 0.50 ATOM 7 CD GLU 1 7.339 -8.625 3.562 1.00 0.50 ATOM 8 OE1 GLU 1 8.370 -8.307 4.131 1.00 0.50 ATOM 9 OE2 GLU 1 6.284 -8.846 4.132 1.00 0.50 ATOM 10 N THR 2 7.998 -11.599 2.304 1.00 1.60 ATOM 11 CA THR 2 7.266 -12.832 1.907 1.00 1.60 ATOM 12 C THR 2 6.096 -12.456 1.005 1.00 1.60 ATOM 13 O THR 2 5.008 -12.217 1.466 1.00 1.60 ATOM 14 CB THR 2 6.731 -13.533 3.157 1.00 1.60 ATOM 15 OG1 THR 2 7.662 -13.379 4.220 1.00 1.60 ATOM 16 CG2 THR 2 6.526 -15.019 2.864 1.00 1.60 ATOM 17 N VAL 3 6.308 -12.396 -0.278 1.00 1.70 ATOM 18 CA VAL 3 5.190 -12.030 -1.187 1.00 1.70 ATOM 19 C VAL 3 3.954 -12.870 -0.844 1.00 1.70 ATOM 20 O VAL 3 2.834 -12.471 -1.090 1.00 1.70 ATOM 21 CB VAL 3 5.608 -12.274 -2.641 1.00 1.70 ATOM 22 CG1 VAL 3 5.542 -13.771 -2.959 1.00 1.70 ATOM 23 CG2 VAL 3 4.664 -11.514 -3.573 1.00 1.70 ATOM 24 N GLU 4 4.146 -14.029 -0.272 1.00 1.70 ATOM 25 CA GLU 4 2.976 -14.882 0.086 1.00 1.60 ATOM 26 C GLU 4 2.153 -14.190 1.175 1.00 1.50 ATOM 27 O GLU 4 0.942 -14.141 1.109 1.00 1.40 ATOM 28 CB GLU 4 3.465 -16.238 0.597 1.00 1.30 ATOM 29 CG GLU 4 2.336 -17.264 0.479 1.00 1.20 ATOM 30 CD GLU 4 2.929 -18.671 0.391 1.00 1.10 ATOM 31 OE1 GLU 4 4.056 -18.846 0.823 1.00 1.00 ATOM 32 OE2 GLU 4 2.246 -19.551 -0.108 1.00 0.90 TER END(B) A model consisting of 2 independent structure segments (could be a target modeled from two PDB domains, where relative orientation is unknown; could be 2 fragments predicted by ab initio methods - ab initio example shown). In a single MODEL no residue should appear twice among all such segments.
PFRMAT TS TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 PARENT N/A ATOM 1 N GLU 1 10.982 -9.774 1.377 1.00 0.50 ATOM 2 CA GLU 1 9.623 -9.833 1.984 1.00 0.50 ATOM 3 C GLU 1 8.913 -11.104 1.521 1.00 0.50 ATOM 4 O GLU 1 9.187 -11.630 0.461 1.00 0.50 ATOM 5 CB GLU 1 8.814 -8.614 1.546 1.00 0.50 ATOM 6 CG GLU 1 7.372 -8.754 2.039 1.00 0.50 ATOM 7 CD GLU 1 7.339 -8.625 3.562 1.00 0.50 ATOM 8 OE1 GLU 1 8.370 -8.307 4.131 1.00 0.50 ATOM 9 OE2 GLU 1 6.284 -8.846 4.132 1.00 0.50 ATOM 10 N THR 2 7.998 -11.599 2.304 1.00 1.60 ATOM 11 CA THR 2 7.266 -12.832 1.907 1.00 1.60 ATOM 12 C THR 2 6.096 -12.456 1.005 1.00 1.60 ATOM 13 O THR 2 5.008 -12.217 1.466 1.00 1.60 ATOM 14 CB THR 2 6.731 -13.533 3.157 1.00 1.60 ATOM 15 OG1 THR 2 7.662 -13.379 4.220 1.00 1.60 ATOM 16 CG2 THR 2 6.526 -15.019 2.864 1.00 1.60 ATOM 24 N GLU 4 4.146 -14.029 -0.272 1.00 1.70 ATOM 25 CA GLU 4 2.976 -14.882 0.086 1.00 1.60 ATOM 26 C GLU 4 2.153 -14.190 1.175 1.00 1.50 ATOM 27 O GLU 4 0.942 -14.141 1.109 1.00 1.40 ATOM 28 CB GLU 4 3.465 -16.238 0.597 1.00 1.30 ATOM 29 CG GLU 4 2.336 -17.264 0.479 1.00 1.20 ATOM 30 CD GLU 4 2.929 -18.671 0.391 1.00 1.10 ATOM 31 OE1 GLU 4 4.056 -18.846 0.823 1.00 1.00 ATOM 32 OE2 GLU 4 2.246 -19.551 -0.108 1.00 0.90 TER PARENT N/A ATOM 17 N VAL 3 6.308 -12.396 -0.278 1.00 1.70 ATOM 18 CA VAL 3 5.190 -12.030 -1.187 1.00 1.70 ATOM 19 C VAL 3 3.954 -12.870 -0.844 1.00 1.70 ATOM 20 O VAL 3 2.834 -12.471 -1.090 1.00 1.70 ATOM 21 CB VAL 3 5.608 -12.274 -2.641 1.00 1.70 ATOM 22 CG1 VAL 3 5.542 -13.771 -2.959 1.00 1.70 ATOM 23 CG2 VAL 3 4.664 -11.514 -3.573 1.00 1.70 TER END(C) Threading/Fold Recognition prediction stating that target has no homologue in the current PDB.
PFRMAT TS TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 PARENT NONE TER END
Note to predictors: it may be interesting to predict the secondary structure of proteins even when a clear structural homologue is known for the target. In cases where the target sequence is divergent from the template, secondary structure prediction may be more accurate than that implied by the template and visa versa.
PFRMAT SS TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 H E 0.70 # <- residue code, L E 0.80 # <- secondary structure assignment code, E E 0.80 # <- the number specifying the associated G E 0.60 # confidence level S C 0.90 I E 0.50 G E 0.40 I E 0.60 L E 0.70 L C 0.50 K C 0.50 K H 0.90 H H 0.90 E H 0.90 I H 0.80 V H 0.70 F C 0.90 D C 0.90 G H 0.40 C C 0.40 END
PFRMAT RR TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 HLEGSIGILLKKHEIVFDGC # <- entire target sequence (up to 50 HDFGRTYIWQMSD # residues per line) 1 9 0 8 0.70 1 10 0 8 0.70 # <- indices of residues: i and j (integers), 1 12 0 8 0.60 # <- the range of Cb-Cb distance predicted 1 14 0 8 0.20 # for the residue pair: d1 and d2 (real), 1 15 0 8 0.10 # <- probability of the distance between 1 17 0 8 0.30 # Cb atoms being within the specified 1 19 0 8 0.50 # range: p (real) 2 8 0 8 0.90 3 7 0 8 0.70 3 12 0 8 0.40 3 14 0 8 0.70 3 15 0 8 0.30 4 6 0 8 0.90 7 14 0 8 0.30 9 14 0 8 0.50 END
(A) Format to express unambiguous alignments to PDB entries 'mabc_A' and 'nefg'.
PFRMAT AL TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 PARENT mabc_A M 21 V 11 P 22 D 12 N 23 A 12A F 24 F 12B A 25 L 13 P 32 D 22 N 33 A 23 F 34 F 24 A 35 L 25 TER PARENT nefg E 75 T 73 T 76 T 74 V 77 A 75 D 78 D 76 G 79 D 77 R 80 R 78 TER END(B) Format to express unambiguous alignments to PDB entry 'mabc_D'. An example of how to use the AL format to submit a prediction of the target with a chain name of 'A'.
PFRMAT AL TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 PARENT mabc_D M A21 V 11 P A22 D 12 N A23 A 12A F A24 F 12B A A25 L 13 P A32 D 22 N A33 A 23 F A34 F 24 A A35 L 25 TER END
PFRMAT TS TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 PARENT N/A ATOM 17 N VAL A 3 6.308 -12.396 -0.278 1.00 1.70 ATOM 18 CA VAL A 3 5.190 -12.030 -1.187 1.00 1.70 ATOM 19 C VAL A 3 3.954 -12.870 -0.844 1.00 1.70 ATOM 20 O VAL A 3 2.834 -12.471 -1.090 1.00 1.70 ATOM 21 CB VAL A 3 5.608 -12.274 -2.641 1.00 1.70 ATOM 22 CG1 VAL A 3 5.542 -13.771 -2.959 1.00 1.70 ATOM 23 CG2 VAL A 3 4.664 -11.514 -3.573 1.00 1.70 ATOM 24 N GLU A 4 4.146 -14.029 -0.272 1.00 1.70 ATOM 25 CA GLU A 4 2.976 -14.882 0.086 1.00 1.60 ATOM 26 C GLU A 4 2.153 -14.190 1.175 1.00 1.50 ATOM 27 O GLU A 4 0.942 -14.141 1.109 1.00 1.40 ATOM 28 CB GLU A 4 3.465 -16.238 0.597 1.00 1.30 ATOM 29 CG GLU A 4 2.336 -17.264 0.479 1.00 1.20 ATOM 30 CD GLU A 4 2.929 -18.671 0.391 1.00 1.10 ATOM 31 OE1 GLU A 4 4.056 -18.846 0.823 1.00 1.00 ATOM 32 OE2 GLU A 4 2.246 -19.551 -0.108 1.00 0.90 REMARK REMARK NOTE: Predictor should NOT use TER separator between chains REMARK if multichain independent segment of structure has to REMARK be evaluated as a one fragment REMARK ATOM 1 N GLU B 1 10.982 -9.774 1.377 1.00 0.50 ATOM 2 CA GLU B 1 9.623 -9.833 1.984 1.00 0.50 ATOM 3 C GLU B 1 8.913 -11.104 1.521 1.00 0.50 ATOM 4 O GLU B 1 9.187 -11.630 0.461 1.00 0.50 ATOM 5 CB GLU B 1 8.814 -8.614 1.546 1.00 0.50 ATOM 6 CG GLU B 1 7.372 -8.754 2.039 1.00 0.50 ATOM 7 CD GLU B 1 7.339 -8.625 3.562 1.00 0.50 ATOM 8 OE1 GLU B 1 8.370 -8.307 4.131 1.00 0.50 ATOM 9 OE2 GLU B 1 6.284 -8.846 4.132 1.00 0.50 ATOM 10 N THR B 2 7.998 -11.599 2.304 1.00 1.60 ATOM 11 CA THR B 2 7.266 -12.832 1.907 1.00 1.60 ATOM 12 C THR B 2 6.096 -12.456 1.005 1.00 1.60 ATOM 13 O THR B 2 5.008 -12.217 1.466 1.00 1.60 ATOM 14 CB THR B 2 6.731 -13.533 3.157 1.00 1.60 ATOM 15 OG1 THR B 2 7.662 -13.379 4.220 1.00 1.60 ATOM 16 CG2 THR B 2 6.526 -15.019 2.864 1.00 1.60 TER END(B) An example of how to use the RR format to submit a prediction of interchain (chains A and B) residue-residue contacts defined as Cb-Cb distances < 8 A.
PFRMAT RR TARGET Txxxx AUTHOR xxxx-xxxx-xxxx REMARK Predictor remarks METHOD Description of methods used METHOD Description of methods used METHOD Description of methods used MODEL 1 HLEGSIGILLKKHEIVFDGC # <- entire target sequence (up to 50 HDFGRTYIWQMSD # residues per line) A1 B9 0 8 0.70 A1 B10 0 8 0.70 # <- indices of residues: Ai and Bj, A1 B12 0 8 0.60 # <- the range of Cb-Cb distance predicted A1 B14 0 8 0.20 # for the residue pair: d1 and d2 (real), A1 B15 0 8 0.10 # <- probability of the distance between A1 B17 0 8 0.30 # Cb atoms being within the specified A1 B19 0 8 0.50 # range: p (real) A2 B8 0 8 0.90 A3 B7 0 8 0.70 A3 B12 0 8 0.40 A3 B14 0 8 0.70 A3 B15 0 8 0.30 A4 B6 0 8 0.90 A7 B14 0 8 0.30 A9 B14 0 8 0.50 END