David Baker, University of Washington
Nick Grishin, University of Texas
David Jones, University College, London
Justin MacCallum, University of Calgary
Michael Sternberg, Imperial College, London
- Predictions for CASP11 may be submitted in 6 separate formats:
TS # 3D atomic coordinates (Tertiary Structure) prediction
AL # ALignment to PDB entries (obsolete format, allowed for old servers only)
RR # Residue-Residue separation distance prediction
DR # Order-Disorder Regions prediction
QA # Quality assessment
- One team may make a prediction of a target by submitting up to five models
in the TS or AL categories, and one model in RR and DR categories.
In QA category predictors may submit two different models: first model - after releasing
the limited number of server TS models (prediction window will be open for two days
after the tarball release), and second model - after releasing a set of tentatively
best 150 server TS models (see the QA format section for the timeline example).
- Submissions for refinement, contact-assisted, alignment-assisted
targets should be submitted in TS format. Input data for contact-assisted
predictions will be provided in RR format, for alignment-assisted - in AL format.
A starting model for the refinement will be selected from among the best server models.
- Each submission file should contain prediction for only one target.
- Each submission file should contain only one of the allowed format categories.
- Submission files in RR, DR and QA categories should contain only one model.
- Submission files in TS/AL categories may contain either one or several
models. Most of the evaluation and assessment will focus on
the model labeled '1' (model index 1, see MODEL record). Each model should
begin with the MODEL record, end with the END record, and contain no target
residue repetitions. You may specify only one set of required header fields
(PFRMAT, TARGET, AUTHOR, METHOD) above the first MODEL record in the prediction
file. A multiple-model file will be split into separate files (one model per
file) and each model (up to 5) will be sent separately to the verification server.
- Submission of a duplicate model (same target, format category, group, model
index) will replace previously accepted model, provided it is received before
the deadline.
- Each submission must begin with the PFRMAT, TARGET and AUTHOR records,
contain the METHOD field and at least one block starting with the MODEL
and ending with the END record.
- Each submitted model is automatically verified by the format verification
server. In case of successful submission no confirmation email will be sent.
A unique ACCESSION CODE is composed from the number of the target, prediction
format category, prediction group number, and model index.
Example:
Accession code T0444TS005_2 has the following components:
T0044 target number
TS Tertiary Structure (PFRMAT TS)
005 prediction group 5
2 model index 2
The accepted predictions could be viewed using Model Viewer link from the CASP11
web page.
If the submission contains an error, the regular group leader or server contact
person will be immediately notified through email. If your prediction is rejected
for format inconsistency, you will have the possibility to correct problems and
re-send prediction(s) within the target prediction time window.
Submission rules for regular prediction groups (usually, 3-week deadline in TS category)
-
Predictions can be submitted by a group leader or a group member with submission
privileges. The group leader can set the privileges (regular member or submitter)
for every member of his group using the 'Review member status' option from
'My CASP11 profile' link. Members of prediction groups who intend to submit predictions
should receive submission permission from the group leader first and then use
the 12-digit Registration Code of the group to submit predictions for that group.
- Models for regular deadline groups should be submitted directly by e-mail to
models AT predictioncenter.org or using
the CASP11 model submission facility.
-
When sending predictions by email, please send them in the body of the message.
Predictions in attachments to the emails will be rejected.
-
When sending predictions by email, please remember to use as an origination point
only the email address registered with the Prediction Center
(make sure we have
the updated email address for you on file - check for this your "My Personal Data"
link from the menu). If you temporary cannot use the registered email address for
submission, please use the submission form
instead.
-
Time for returning regular group predictions is set separately for each target
through the Target List form. Usually regular deadline predictors have around 3 weeks
from the date of target release to return a prediction. For the most difficult
targets this period is usually slightly prolonged.
-
Predictions in TS and RR categories should be normally sent only on all-group targets.
-
Predictions in DR category should be sent on all targets. The DR predictions on server-only targets are due on the server expiration deadline.
-
Multimeric predictions may be sent for all targets. They are strongly encouraged for the targets
marked as 'CAPRI' irrelevantly to the target type (i.e., for both, all-group and server-only targets). Human expiration dates apply for all multimeric predictions (usually 3 weeks for all-group targets and 10 days for server-only targets).
-
Deadlines for predictions in QA category are the same for regular and server groups (2 days).
Submission rules for server groups (3-day deadline in non-QA prediction categories, 2 day deadline for QA)
-
CASP11 queries will be sent to the registered servers from the CASP distribution
server casp-meta AT predictioncenter.org. Email servers are advised
to reply to this address immediately upon receiving the query with an acceptance
email with subject: "T0999 - query received by MY_SERVER". This will help us to
track whether your server received a request from us so that we can timely address
any connectivity issues. Please do not send your predictions to this address as
they will be ignored.
-
We will be sending 3 variables to your server's submission URL (or email):
the SEQUENCE, the TARGET-NAME and the REPLY-E-MAIL (where to return the results).
For the servers participating in quality assessment, contact-assisted and
alignment-assisted categories of prediction we will be sending the TARBALL-LOCATION
variable instead of (or in addition to, if you specify so) the SEQUENCE.
Names for these server-specific parameters will be taken from your server
registration form.
- Server models should be returned automatically to the address specified in the
REPLY-E-MAIL field of the query. Please note that the return address should be always
taken from our query and not hard-coded as we may change it during the season.
-
TS, RR and DR servers are requested to return predictions in 72 hours from the target release time.
No additional time for corrections will be allotted, but corrections will be accepted
within the original 72 hour window. Please, send your corrections manually to the
address specified in the REPLY-E-MAIL field of the original query. Remember, that
corrections can be submitted only by a group leader or a group member with submission
privileges. The group leader can set the privileges (regular member or submitter)
for every member of his group using the 'Review member status' option from
'My CASP11 profile' link. Members of prediction groups who intend to submit predictions
should receive submission permission from the group leader first.
-
Server models must be submitted in the body of the email as a plain text.
Predictions in attachments to the emails will be rejected. Subject of the email
preferrably should contain the target number and the group name.
- Each submission may contain several models. If server returns more than 5
models, the models numbered 6 and higher will be ignored (or 2 and higher for RR,
and DR categories). In QA category either model 1 or model 2 will be accepted
depending on the stage of the QA request (see the General Rules above or description
of the MODEL record below).
-
The submission engine will resend the query if it encounters obvious connecting problems
(network timeouts, 'no response' etc.). Failures that go beyond that require special attention,
but we'll make every effort to notify server curators ASAP if we suspect something is
not working. The facility that
allows checking accepting predictions from servers is available from our website.
Format description
All submissions contain records described below.
Each of these records must begin with a standard keyword.
In all submissions standard keywords must
begin in the first column of a record.
The keyword set is as follows:
PFRMAT Format specification code: TS , AL , RR , DR, QA
TARGET Target identifier from the CASP11 target table
AUTHOR XXXX-XXXX-XXXX Registration code of the Group Leader or Server Group Name
SCORE Reliability of the model (optional)
REMARK Comment record (may appear anywhere after the first 3 required lines, optional)
METHOD Records describing the methods used
MODEL Beginning of the data section for the submitted model
PARENT Specifies structure template used to generate the TS/AL model
TER Terminates independent segments of structure in the TS/AL model
END End of the submitted model
Models should be submitted in Plain Text format.
Record PFRMAT should appear on the first line of the prediction and
is used for all submissions.
PFRMAT TS
TS indicates that submission contains 3D atomic coordinates
in standard PDB format
PFRMAT RR
RR indicates that submission contains a residue-residue
separation distance prediction
PFRMAT AL
AL indicates that submission contains unambiguous alignments
to PDB entries
PFRMAT DR
DR indicates that submission contains an order-disorder regions
prediction
PFRMAT QA
QA indicates a models quality assessment prediction
Record TARGET should appear on the second line of the prediction and
is used for all submissions.
TARGET Txxxx
Txxxx indicates id of the target predicted.
Record AUTHOR should appear on the third line of the prediction
and is used for all submissions.
For all groups:
AUTHOR XXXX-XXXX-XXXX
XXXX-XXXX-XXXX indicates the Group Registration code.
This is the code obtained by the group leader upon registration.
Note: Members of prediction groups who intend to submit predictions
should receive submission permissions from the group leader and
use the registration code of the Group for all predictions submitted by
that group. If sending predictions by email, please send them from the
registered emails of the group leader or group submitter.
If you temporary can not use these emails for submission, please login
to our website and then use our web-based submission facility.
Servers alternatively can be identified using their registered group names:
AUTHOR MY_SERVER_NAME
or
REMARK AUTHOR MY_SERVER_NAME
where MY_SERVER_NAME is a name selected for the server group at registration
SCORE Optional. This record may be used to report a model
reliability score. It will not influence the evaluation.
REMARK Optional. PDB style 'REMARK' records may be used
anywhere in the submission. These records may contain any
text and will in general not influence evaluation.
Records METHOD are used for all submissions.
These records describe the methods used. Predictors are urged to provide
as full a description of the methods as possible, including references,
data libraries used, and values of default and non-default parameters.
These descriptions will be made available via the Prediction Center WEB
pages as well as printed along with the other materials distributed at the
meeting. Length of 100 - 500 words is suggested.
Record MODEL is used for all submissions.
Signifies the beginning of model data.
MODEL n
n Model index n is used to indicate predictor's ranking
according to her/his belief which TS/AL model is closest to the
target structure (1 <= n <= 5). Model index is included
automatically in the ACCESSION CODE. All models with index
higher than 5 will be discarded.
Model index should be set to 1 in RR and DR categories.
In QA category, predictors are requested to use model index '1' for the predictions
submitted at the first QA stage (i.e., for the quality estimates made on the selected
set of server models released 5 days after the target release for tertiary structure
:prediction), and use model index '2' for the predictions submitted on a larger set of
TS models at the second QA stage (i.e., for the quality estimates made on the models
released 2 days after the release of the first set of models for QA prediction).
Record PARENT is required only for the submissions in the TS and AL
format.
PARENT record indicates structure templates used to generate any independent
segment of MODEL (see description of the TS format below).
The PARENT record should be placed as the first record of any such independent
segment. Only one PARENT record per structure segment is allowed. For multimeric
predictions only one PARENT record per whole structure is allowed.
PARENT N/A
Indicates that a prediction is not directly based on any known
structure. Note that this is the only indication in the file that the
prediction is ab initio, so is a critical piece of information.
PARENT 1abc_A
Indicates that the model or the independent segment of structure is
based on a single PDB entry 1abc chain A (use _A to indicate chain A).
All template-based predictions should be submitted with this form
of the PARENT record. Note that, in order to be accepted, the code
must correspond to a current PDB entry.
PARENT 1cdc 2def_g [3hij_k ...]
Indicates that the model is based on more than one structural template.
Up to five PDB chains may be listed here with additional detailed information
included in the METHOD records. Subdomains of the target structure found
to correspond to different known folds may be submitted as independent
segments of structure with reference to only one PDB chain per segment.
Record TER is used to terminate an independent segment of structure
(PFRMAT TS and PFRMAT AL). Every TER record should correspond
to the preceding PARENT record in the model.
TER
3D atomic coordinates (PFRMAT TS).
Standard PDB atom records are used for the atomic coordinates. Format of the
submission requires that 80 column long records are used. These may be spaces
when needed (see target template PDB files as provided in specific target
descriptions available through the CASP11 target table).
Coordinates for each model or an independent structure segment should begin
with a single PARENT record and terminate with a TER record (see above).
It is requested that coordinate data be supplied for at least all
non-hydrogen main chain atoms, i.e. the N, CA, C and O atoms of every residue.
Specifically, if only CA atoms are predicted by the method, predictors are
encouraged to build the main chain atoms for every residue before submission
to CASP. One program that can make such a conversion is
Maxsprout server
of Liisa Holm and co-workers. (If only CA atoms were submitted it would not be
possible to run most of the analysis software, which would severely limit the
evaluation of that prediction.)
When multiple independent segments of structure are used in a prediction,
they will be evaluated separately with no assumption of a common
frame of reference between the segments. For any given MODEL, no target
residue may be repeated among all such independent structure segments.
Even though all of the independent PARENT-TER frames will be evaluated,
only the best scoring frame will contribute to the group score on any
given evaluation domain. Potential multi-domain nature of targets will be
addressed in the evaluation even if the prediction
is made in a single frame of reference (i.e. without separation into multiple
segments of structure).
For quaternary structure predictions, coordinates for all chains should be submitted
in the same frame of reference and therefore only one PARENT - TER section is allowed
per prediction. This means that no TER record should separate different chains
(this is different from the PDB!). First chain should be labeled as A and all
the subsequent chains should follow the latin alphabet, e.g., tetramer's chains should
be labeled as A, B, C, D.
There will be no announcements or assignments of targets to
oligomeric prediction category. Instead, for every target
you can submit either tertiary structure prediction or quaternary
structure prediction. There is no need to submit monomer in addition to a multimer:
we will automatically extract coordinates of the first chain from the quaternary
prediction and save it as a monomer for future mainstream evaluation alongside
with monomers submitted by other groups. Multimeric predictions will be evaluated
separately. Tentative oligomeric state of the protein (if provided by the
experimentalists) will be announced through our Target List page, but it is up to
predictor to decide what oligomerization state the protein is in.
Atoms for which a prediction has been made must contain a value between 0.01 and 1.00
(usually "1.00") in the occupancy field; those for which no prediction has been
made must either contain "0.00" in that field or be skipped altogether.
In place of temperature factor field, the error estimates, in Angstroms, should
be provided. We require predictors to submit their error estimates for own
predictions as these results will be separately evaluated in the quality
assessment category. Models with all residues having the same 'B-factor'
will be rejected.
An unambiguous alignment to a PDB entry used for threading predictions
(PFRMAT AL).
This format is deprecated and allowed for old structure prediction servers only.
Alignment for each model or an independent structure segment should begin
with a single PARENT record and terminate with a TER record (see above).
The (four column) alignment data records provide: target residue one
letter symbol, target residue sequence number, PDB residue one letter symbol,
and PDB residue sequence number with an insertion code if necessary
(see Example 3):
aa1 n1 aa2 n2
Note:
- residues for which no prediction is made must be skipped
- if a chain ID is specified in the PDB template of the target, then
the target residue sequence number should be composed of a chain ID
and residue number, e.g. A2, B44
The PDB code with chain extension of the structure the alignment is based on
should be placed in the PARENT record.
Only one PDB code per independent structure segment is allowed.
PDB codes should refer to structures containing at least the main chain atomic
coordinates (see the TS format).
As in the case of coordinate submissions,
when multiple independent segments of structure are used in a prediction,
they will be evaluated separately with no assumption of a common
frame of reference between the segments. For any given MODEL, no target
residue may be repeated among all such independent structure
segments. Potential multi-domain
nature of targets will be addressed in the evaluation even if the prediction
is made in a single frame of reference (i.e. without separation into multiple
segments of structure). For such predictions segmentation should only be
used to allow multiple model predictions (effectively up to 5 predictions
for each such domain).
Note:
The facility to translate sequence - structure alignments (AL format) into
standard PDB atom records (TS format) is available as an additional
AL2TS service.
Residue-Residue separation prediction (PFRMAT RR).
Data in this format are inserted between MODEL and END records of the
submission file.
The prediction should start with the sequence of the predicted target
splitted (if necessary) in several rows (see Example 2).
The sequence should be followed by the list of contacts in the
five-column format:
i j d1 d2 p
Notes (see Example 2):
- indices i and j of the two residues in contact should be provided
such that i < j, i.e. only half of the contact map is supplied.
- the numbers d1 and d2 indicate the distance limits defining a contact.
In CASP, a pair of residues is defined to be in contact when
the distance between their C-beta atoms (C-alpha in case of glycine)
is less then 8 Angstroms. Therefore, typically d1=0 and d2=8.
These parameters are currently dumb and left in the format
only for the consistency with previous CASPs.
- the real number p indicates probability of the two residues being
in contact, and should be in the range 0.0 - 1.0. Values larger
than 0.5 identify the pairs of residues that are predicted to be
more likely in contact than not. In binary (two-class) evaluations,
the probability value of 0.5 will be considered as the cutoff
separating contacts from non-contacts.
NEW! Contacts in the prediction should be listed
according to the decreasing probability p. If several contacts
are assigned the same probability, for the evaluation purposes
they will be considered in the order provided in the prediction.
- any pair NOT listed is assumed to be predicted as not in contact.
- for multichain predictions, residue indices should be composed of
chain ID and residue number, e.g. A2, B44 (see Example 4B).
Order-disorder regions prediction (PFRMAT DR).
Data in this format are inserted between MODEL and END
records of the submission file.
The (three column) format record consists of residue code, Order/Disorder
prediction code, and a number specifying the associated confidence level:
aa OD p
The symbols for the 2 state order/disorder prediction are
'O'=order, 'D'=disorder.
Last column should indicate a probability of a residue being in the
disordered region. The value of this confidence level is in the
range of 0.0 - 1.0 (values 0.51 and higher designate disordered state).
The entire sequence of the target should always be given. If parts
cannot be predicted a probability value of 0.5 should be used (see Example 5).
Quality assessment prediction (PFRMAT QA).
In QA category, predictors are requested to use model index '1' for predictions
submitted in the first stage (i.e., estimating quality of the selected
server models released 5 days after the initial target release),
and use model index '2' for predictions submitted on the second, larger set of
TS models (i.e., estimating quality of models released 7 days after the
initial target release).
Timeline example.
May 1, 9am PDT - target T0644 is released for prediction in non-QA categories.
May 4, noon - the deadline for submitting tertiary structure predictions by servers.
May 6, noon - the first set of server TS predictions (up to 20 models selected
primarily to test single-model methods) is sent to the registered QA servers and
posted on the casp11 archive page (http://predictioncenter.org/download_area/CASP11/server_predictions/).
QA predictions (marked as MODEL 1) for this subset are accepted for two days.
May 8, noon - deadline for "stage 1" QA predictions. The second set of server TS predictions
(150 models selected to test both, single-model and clustering methods) is sent to the
registered QA servers and posted on the casp11 archive page. QA predictions
(marked as MODEL 2) for this second subset of models are accepted for two more days.
May 10, noon - deadline for "stage 2" QA predictions. All server TS predictions are posted on
the casp11 archive page. No further QA predictions (from servers or manual groups) are accepted
for this target.
Data are inserted between MODEL and END records of the submission file.
You may submit your quality assessment prediction in one of the two different modes:
QMODE 1 : global model quality score (MQS - one number per model)
QMODE 2 : MQS and error estimates on per-residue basis.
The first line of data should specify mode identifier, i.e. QMODE (see Example 6).
In both modes, the first column in each line contains model identifier (file name of the
accepted 3D prediction).
The second column contains reliability score for a model as a whole.
The reliability score is a real number between 0.0 and 1.0 (1.0 being a perfect model).
If you don't provide MQS for a model please put "X" in the corresponding place.
If you don't want to additionally provide error estimates on per residue basis
(QMODE 1), your data table will consist of these two columns only.
If you do additionally provide residue error estimates (QMODE 2),
each consecutive column should contain error estimate in Angstroms for all the
consecutive resides in the target (i.e., column 3 corresponds to residue 1 in
the target, column 4 - to residue 2 and so on). This way data constitute a table
(Number_of_models_for_the_target) BY (Number_of_residues_in_the_target + 1).
Do not skip columns if you are not predicting error estimates for some residues -
instead put "X" in the corresponding column.
Please specify in the REMARKS what you consider to be an error estimate for a residue
(CA location error, geometrical center error, etc.).
Note 1. Please, be advised that a QA record line may be very long and then some
editors/mailing programs may force line wrap potentially causing unexpected parsing errors.
To avoid this problem we recommend that you split long lines into shorter sublines
(50-100 columns of data) by yourself. Our parser will consider consecutive sublines
(starting with the line containing evaluated model name and ending with the line
containing the next model name or tag END) a part of the same logical line.
Note 2. Please, be advised that model quality predictions in CASP are evaluated
by comparing submitted estimates of global reliability and per-residue accuracy of structural
models with the values obtained from the LGA superpositions of the corresponding models
with experimental structures. Therefore, perfect global model scores in QMODE1 (QA1) should
ideally correspond to the GDT_TS scores, and predicted per-residue distances
in QMODE2 should ideally reproduce those extracted from the optimal model-target superpositions.
END record is used for all predictions and indicates the end of a
single model submission.
Example 1. Atomic coordinates (Tertiary Structure)
The primary CASP11 format used for tertiary structure prediction
(A) An example of prediction.
PFRMAT TS
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT 1abc 1def_A
ATOM 1 N GLU 1 10.982 -9.774 1.377 1.00 0.50
ATOM 2 CA GLU 1 9.623 -9.833 1.984 1.00 0.50
ATOM 3 C GLU 1 8.913 -11.104 1.521 1.00 0.50
ATOM 4 O GLU 1 9.187 -11.630 0.461 1.00 0.50
ATOM 5 CB GLU 1 8.814 -8.614 1.546 1.00 0.50
ATOM 6 CG GLU 1 7.372 -8.754 2.039 1.00 0.50
ATOM 7 CD GLU 1 7.339 -8.625 3.562 1.00 0.50
ATOM 8 OE1 GLU 1 8.370 -8.307 4.131 1.00 0.50
ATOM 9 OE2 GLU 1 6.284 -8.846 4.132 1.00 0.50
ATOM 10 N THR 2 7.998 -11.599 2.304 1.00 1.60
ATOM 11 CA THR 2 7.266 -12.832 1.907 1.00 1.60
ATOM 12 C THR 2 6.096 -12.456 1.005 1.00 1.60
ATOM 13 O THR 2 5.008 -12.217 1.466 1.00 1.60
ATOM 14 CB THR 2 6.731 -13.533 3.157 1.00 1.60
ATOM 15 OG1 THR 2 7.662 -13.379 4.220 1.00 1.60
ATOM 16 CG2 THR 2 6.526 -15.019 2.864 1.00 1.60
ATOM 17 N VAL 3 6.308 -12.396 -0.278 1.00 1.70
ATOM 18 CA VAL 3 5.190 -12.030 -1.187 1.00 1.70
ATOM 19 C VAL 3 3.954 -12.870 -0.844 1.00 1.70
ATOM 20 O VAL 3 2.834 -12.471 -1.090 1.00 1.70
ATOM 21 CB VAL 3 5.608 -12.274 -2.641 1.00 1.70
ATOM 22 CG1 VAL 3 5.542 -13.771 -2.959 1.00 1.70
ATOM 23 CG2 VAL 3 4.664 -11.514 -3.573 1.00 1.70
ATOM 24 N GLU 4 4.146 -14.029 -0.272 1.00 1.70
ATOM 25 CA GLU 4 2.976 -14.882 0.086 1.00 1.60
ATOM 26 C GLU 4 2.153 -14.190 1.175 1.00 1.50
ATOM 27 O GLU 4 0.942 -14.141 1.109 1.00 1.40
ATOM 28 CB GLU 4 3.465 -16.238 0.597 1.00 1.30
ATOM 29 CG GLU 4 2.336 -17.264 0.479 1.00 1.20
ATOM 30 CD GLU 4 2.929 -18.671 0.391 1.00 1.10
ATOM 31 OE1 GLU 4 4.056 -18.846 0.823 1.00 1.00
ATOM 32 OE2 GLU 4 2.246 -19.551 -0.108 1.00 0.90
TER
END
(B) A model consisting of 2 independent structure segments (could be a target
modeled from two PDB domains, where relative orientation is unknown;
could be 2 fragments predicted by ab initio methods - ab initio example shown).
In a single MODEL no residue should appear twice among all such segments.
PFRMAT TS
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT N/A
ATOM 1 N GLU 1 10.982 -9.774 1.377 1.00 0.50
ATOM 2 CA GLU 1 9.623 -9.833 1.984 1.00 0.50
ATOM 3 C GLU 1 8.913 -11.104 1.521 1.00 0.50
ATOM 4 O GLU 1 9.187 -11.630 0.461 1.00 0.50
ATOM 5 CB GLU 1 8.814 -8.614 1.546 1.00 0.50
ATOM 6 CG GLU 1 7.372 -8.754 2.039 1.00 0.50
ATOM 7 CD GLU 1 7.339 -8.625 3.562 1.00 0.50
ATOM 8 OE1 GLU 1 8.370 -8.307 4.131 1.00 0.50
ATOM 9 OE2 GLU 1 6.284 -8.846 4.132 1.00 0.50
ATOM 10 N THR 2 7.998 -11.599 2.304 1.00 1.60
ATOM 11 CA THR 2 7.266 -12.832 1.907 1.00 1.60
ATOM 12 C THR 2 6.096 -12.456 1.005 1.00 1.60
ATOM 13 O THR 2 5.008 -12.217 1.466 1.00 1.60
ATOM 14 CB THR 2 6.731 -13.533 3.157 1.00 1.60
ATOM 15 OG1 THR 2 7.662 -13.379 4.220 1.00 1.60
ATOM 16 CG2 THR 2 6.526 -15.019 2.864 1.00 1.60
ATOM 24 N GLU 4 4.146 -14.029 -0.272 1.00 1.70
ATOM 25 CA GLU 4 2.976 -14.882 0.086 1.00 1.60
ATOM 26 C GLU 4 2.153 -14.190 1.175 1.00 1.50
ATOM 27 O GLU 4 0.942 -14.141 1.109 1.00 1.40
ATOM 28 CB GLU 4 3.465 -16.238 0.597 1.00 1.30
ATOM 29 CG GLU 4 2.336 -17.264 0.479 1.00 1.20
ATOM 30 CD GLU 4 2.929 -18.671 0.391 1.00 1.10
ATOM 31 OE1 GLU 4 4.056 -18.846 0.823 1.00 1.00
ATOM 32 OE2 GLU 4 2.246 -19.551 -0.108 1.00 0.90
TER
PARENT N/A
ATOM 17 N VAL 3 6.308 -12.396 -0.278 1.00 1.70
ATOM 18 CA VAL 3 5.190 -12.030 -1.187 1.00 1.70
ATOM 19 C VAL 3 3.954 -12.870 -0.844 1.00 1.70
ATOM 20 O VAL 3 2.834 -12.471 -1.090 1.00 1.70
ATOM 21 CB VAL 3 5.608 -12.274 -2.641 1.00 1.70
ATOM 22 CG1 VAL 3 5.542 -13.771 -2.959 1.00 1.70
ATOM 23 CG2 VAL 3 4.664 -11.514 -3.573 1.00 1.70
TER
END
Example 2. Residue-Residue contact prediction
The flexibility offered by the new format allows algorithms
parameterized to predict any distance range to be used.
Below is an example of how to use the new residue-residue separation
distance format to submit a prediction of residue contacts defined as Cb-Cb
distances < 8 A.
PFRMAT RR
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
HLEGSIGILLKKHEIVFDGC # <- entire target sequence (up to 50
HDFGRTYIWQMSDASHMD # residues per line)
1 8 0 8 0.720
1 10 0 8 0.715 # <- i=1 j=10: indices of residues (integers),
31 38 0 8 0.710
10 20 0 8 0.690 # <- d1=0 d2=8: the range of Cb-Cb distance
30 37 0 8 0.678 # predicted for the residue pair (i,j)
11 29 0 8 0.673
1 9 0 8 0.63 # <- p=0.63: probability of the residues i=1 and j=9
21 37 0 8 0.502 # being in contact (in descending order)
8 15 0 8 0.401
3 14 0 8 0.400
5 15 0 8 0.307
7 14 0 8 0.30
END
Example 3. An alternative alignment format for Threading/Fold Recognition
predictions
Alignments will be converted into a 3D structures.
(A) Format to express unambiguous alignments to PDB entries 'mabc_A' and 'nefg'.
PFRMAT AL
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT mabc_A
M 21 V 11
P 22 D 12
N 23 A 12A
F 24 F 12B
A 25 L 13
P 32 D 22
N 33 A 23
F 34 F 24
A 35 L 25
TER
PARENT nefg
E 75 T 73
T 76 T 74
V 77 A 75
D 78 D 76
G 79 D 77
R 80 R 78
TER
END
(B) Format to express unambiguous alignments to PDB entry 'mabc_D'.
An example of how to use the AL format to submit a prediction of
the target with a chain name of 'A'.
PFRMAT AL
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT mabc_D
M A21 V 11
P A22 D 12
N A23 A 12A
F A24 F 12B
A A25 L 13
P A32 D 22
N A33 A 23
F A34 F 24
A A35 L 25
TER
END
Example 4. Multichain predictions
(A) An example of 3D atomic coordinates model prediction.
PFRMAT TS
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT N/A
ATOM 1 N GLU A 1 22.576 19.032 -5.026 1.00 0.00
ATOM 2 CA GLU A 1 22.879 20.313 -4.321 1.00 0.00
ATOM 3 CB GLU A 1 22.285 21.478 -5.449 1.00 0.00
ATOM 4 CG GLU A 1 23.018 21.946 -6.707 1.00 0.00
ATOM 5 CD GLU A 1 24.351 22.625 -6.434 1.00 0.00
ATOM 6 OE1 GLU A 1 25.379 21.908 -6.380 1.00 0.00
ATOM 7 OE2 GLU A 1 24.381 23.879 -6.291 1.00 0.00
ATOM 8 O GLU A 1 22.237 20.962 -2.117 1.00 0.00
ATOM 9 C GLU A 1 21.857 20.684 -3.261 1.00 0.00
ATOM 10 N VAL A 2 20.585 20.675 -3.601 1.00 0.00
ATOM 11 CA VAL A 2 19.530 21.006 -2.624 1.00 0.00
ATOM 12 CB VAL A 2 18.277 21.590 -3.319 1.00 0.00
ATOM 13 CG1 VAL A 2 17.182 21.859 -2.270 1.00 0.00
ATOM 14 CG2 VAL A 2 18.656 22.833 -4.079 1.00 0.00
ATOM 15 O VAL A 2 18.770 18.750 -2.603 1.00 0.00
ATOM 16 C VAL A 2 19.096 19.721 -1.933 1.00 0.00
ATOM 17 N HIS A 3 19.115 19.700 -0.603 1.00 0.00
ATOM 18 CA HIS A 3 18.780 18.489 0.122 1.00 0.00
ATOM 19 CB HIS A 3 19.559 18.445 1.410 1.00 0.00
ATOM 20 CG HIS A 3 21.015 18.684 1.224 1.00 0.00
ATOM 21 CD2 HIS A 3 21.767 19.803 1.367 1.00 0.00
ATOM 22 ND1 HIS A 3 21.851 17.721 0.702 1.00 0.00
ATOM 23 CE1 HIS A 3 23.072 18.220 0.589 1.00 0.00
ATOM 24 NE2 HIS A 3 23.048 19.478 0.985 1.00 0.00
ATOM 25 O HIS A 3 16.777 19.181 1.220 1.00 0.00
ATOM 26 C HIS A 3 17.296 18.417 0.409 1.00 0.00
REMARK
REMARK Predictors should NOT use TER separator between chains
REMARK
ATOM 1321 N GLU B 1 -22.603 -17.981 -4.847 1.00 0.00
ATOM 1322 CA GLU B 1 -22.889 -19.285 -4.180 1.00 0.00
ATOM 1323 CB GLU B 1 -22.342 -20.410 -5.372 1.00 0.00
ATOM 1324 CG GLU B 1 -23.122 -20.828 -6.619 1.00 0.00
ATOM 1325 CD GLU B 1 -24.447 -21.511 -6.324 1.00 0.00
ATOM 1326 OE1 GLU B 1 -25.468 -20.792 -6.207 1.00 0.00
ATOM 1327 OE2 GLU B 1 -24.479 -22.769 -6.227 1.00 0.00
ATOM 1328 O GLU B 1 -22.172 -20.020 -2.026 1.00 0.00
ATOM 1329 C GLU B 1 -21.830 -19.701 -3.172 1.00 0.00
ATOM 1330 N VAL B 2 -20.572 -19.685 -3.557 1.00 0.00
ATOM 1331 CA VAL B 2 -19.485 -20.056 -2.630 1.00 0.00
ATOM 1332 CB VAL B 2 -18.260 -20.619 -3.392 1.00 0.00
ATOM 1333 CG1 VAL B 2 -17.131 -20.932 -2.393 1.00 0.00
ATOM 1334 CG2 VAL B 2 -18.674 -21.832 -4.184 1.00 0.00
ATOM 1335 O VAL B 2 -18.711 -17.807 -2.553 1.00 0.00
ATOM 1336 C VAL B 2 -19.020 -18.800 -1.909 1.00 0.00
ATOM 1337 N HIS B 3 -18.990 -18.829 -0.580 1.00 0.00
ATOM 1338 CA HIS B 3 -18.623 -17.648 0.178 1.00 0.00
ATOM 1339 CB HIS B 3 -19.356 -17.649 1.494 1.00 0.00
ATOM 1340 CG HIS B 3 -20.819 -17.875 1.353 1.00 0.00
ATOM 1341 CD2 HIS B 3 -21.571 -18.995 1.480 1.00 0.00
ATOM 1342 ND1 HIS B 3 -21.667 -16.890 0.896 1.00 0.00
ATOM 1343 CE1 HIS B 3 -22.894 -17.378 0.809 1.00 0.00
ATOM 1344 NE2 HIS B 3 -22.864 -18.650 1.156 1.00 0.00
ATOM 1345 O HIS B 3 -16.586 -18.389 1.177 1.00 0.00
ATOM 1346 C HIS B 3 -17.129 -17.592 0.414 1.00 0.00
TER
END
(B) An example of how to use the RR format to submit a prediction of
interchain (chains A and B) residue-residue contacts defined as Cb-Cb
distances < 8 A.
PFRMAT RR
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
HLEGSIGILLKKHEIVFDGC # <- entire target sequence (up to 50
HDFGRTYIWQMSD # residues per line)
A1 B9 0 8 0.70
A1 B10 0 8 0.70 # <- indices of residues: Ai and Bj,
A1 B12 0 8 0.60 # <- the range of Cb-Cb distance predicted
A1 B14 0 8 0.20 # for the residue pair: d1 and d2 (real),
A1 B15 0 8 0.10 # <- probability of the distance between
A1 B17 0 8 0.30 # Cb atoms being within the specified
A1 B19 0 8 0.50 # range: p (real)
A2 B8 0 8 0.90
A3 B7 0 8 0.70
A3 B12 0 8 0.40
A3 B14 0 8 0.70
A3 B15 0 8 0.30
A4 B6 0 8 0.90
A7 B14 0 8 0.30
A9 B14 0 8 0.50
END
Example 5. Order-disorder regions prediction
Example of order-disorder regions prediction.
PFRMAT DR
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
H D 0.70 # <- residue code,
L D 0.80 # <- order/disorder assignment code,
E D 0.80 # <- the number specifying the associated
G D 0.60 # confidence level: 0.5 - residue not predicted
S D 0.90 # >0.5 - disordered region
I O 0.50 # <0.5 - ordered region
G O 0.40
I O 0.40
L O 0.30
L O 0.50
K O 0.50
K O 0.30
H O 0.20
E O 0.20
I O 0.40
V O 0.45
F D 0.60
D D 0.90
G D 0.60
C D 0.80
END
Example 6. Quality assessment prediction
(A) Global Model Quality Score
PFRMAT QA
TARGET T9999
AUTHOR 1234-5678-9000
METHOD Description of methods used
MODEL 1
QMODE 1
3D-JIGSAW_TS1 0.8
FORTE1_AL1.pdb 0.7
END
(B) Residue-based Quality Assessment (fragment of the table). Note, that this case includes case (A) and there is no need to submit QMODE 1 predictions additionlly to QMODE 2.
PFRMAT QA
TARGET T9999
AUTHOR 1234-5678-9000
REMARK Error estimate is CA-CA distance in Angstroms
METHOD Description of methods used
MODEL 1
QMODE 2
3D-JIGSAW_TS1 0.8 10.0 6.5 5.0 2.0 1.0 ...
FORTE1_AL1.pdb 0.7 8.0 5.5 4.5 X X ...
END