Description of the experiment
The main goal of CASP is to obtain an in-depth and objective assessment of our current
abilities and inabilities in the area of protein structure prediction.
To this end, participants will predict as much as possible about a set of soon to be known
structures. These will be true predictions, not post-dictions made on
already known structures.
CASP7 will particularly address the following questions:
-
Are the models produced similar to the corresponding experimental
structure?
-
Is the mapping of the target sequence onto the proposed structure
(i.e. the alignment) correct?
-
Have similar structures that a model can be based on been identified?
-
Are comparative models more accurate than can be obtained by simply
copying the best template?
-
Has there been progress from the earlier CASPs?
-
What methods are most effective?
-
Where can future effort be most productively focused?
Tertiary structure predictions. For CASP7, categories have been redefined
to reflect developments in methods. The 'Template based modeling' category will
include all former comparative modeling, homologous fold based models and some analogous
fold based models. As in CASP6, the 'Template free modeling' category will include models
of proteins with previously unseen folds and hard analogous fold based models.
High resolution models. This new category will include a subset of tertiary structure
models where the backbone is sufficiently accurate that
the details of side chains, loops, and active sites can be meaningfully assessed.
Particular attention will be paid to success in refining these models beyond
the quality obtained by simply copying a best template.
A separate assessor will judge these high accuracy modeling cases.
Other predictions. As in previous CASPs, we will be assessing
the ability of predictors to define boundaries of structural domains,
detect residue-residue contacts and identify disordered regions in target proteins.
Function prediction, a new category introduced in CASP6, will again be
assessed in CASP7. As suggested at the CASP6 meeting, we will also evaluate
ability of predictors to judge on quality of models (without knowing native structures) and
reliability of predicting certain residues in the structure.
There will be additional activities included in or related to CASP7, which extend its scope.
Large scaling benchmarking: It is hoped that the results
of well run benchmarking experiments such as
EVA
and LIVEBENCH
will also be discussed at the CASP meeting.
SAC-CASP7: Alike to 2002 and 2004, the CASP meeting will be joined
by the Student Addendum Conference. SAC-CASP7 will use the same
conference facilities as CASP7. It will start (tentatively) the day before CASP7 and
will last one day.
Registration for the experiment starts in April.
The first targets are expected to be available at the beginning
of May. The prediction season will run for approximately
three months from May through July.
The CASP meeting will
take place at the end of November, and approximately one
month before that, groups with the most accurate and interesting
predictions will receive invitations to give talks.
There will also be discussion of predictions and methods
on the
FORCASP web site.
Participation is open to all.
Intending participants, and those interested in receiving mailings
concerning progress of the experiment should
register for the experiment.
The predictors with servers are requested to register immediately as
we are planning on having a dry run for servers in mid-April.
For the experiment to succeed, it is essential that we obtain the help of the experimental
community. As in previous CASPs, we will invite protein crystallographers and NMR
spectroscopists to provide details of structures they expect to have made public before
October 1, i2006. A target submission form will be available at this web site in mid-April.
Prediction targets will be made available through this web site. All targets will be assigned
an expiry date, and predictions must be received and accepted before that expiration date.
Predictions must be submitted to this web site in
CASP format. For 3D coordinate predictions, this is a simple PDB-like file with consecutive
numbering of residues 1 -> N and a small number of required headers.
As in previous CASPs, independent assessors will evaluate the predictions.
Assessors will be provided with the results
of numerical evaluation of the predictions, and will judge the results
primarily on that basis. They will be asked to focus particularly on the
effectiveness of different methods. Numerical evaluation criteria will
as far as possible be similar to those used in previous CASPs, although
the assessors may be permitted to introduce some additional ones.
There are four assessors, representing
expertise in the template-based modeling, template-free modeling,
high accuracy modeling and function prediction:
Torsten Schwede (University of Bazel, Switzerland) - for template based modeling
Neil Clarke (Genome Institute of Singapore) - for template free modeling
Randy Read (University of Cambridge, UK) - for high resolution modeling
Alfonso Valencia (CNB, Madrid) - for function prediction
In accordance with CASP policy, assessors are not directly
involved in the organization of the experiment, nor can they take part
in the experiment as predictors. Predictors must not
contact assessors directly with queries, but rather these should be sent to the
casp@predictioncenter.org
email address. Click here
for the list of previous CASP assessors.
All CASP predictions and evaluations will be made available through
this web site shortly before the meeting.
The proceedings of the meeting will be published.
All participants will also be encouraged to fully report their results
and methods on the
FORCASP web site. Contributions to the site will be discussed and scored
by other predictors, and this material will be taken into account in choosing
some presentations at the meeting.
A meeting to evaluate the results of the prediction experiment will be held
at the
Asilomar Conference Center (Pacific Grove, California, USA) on November 26-30, 2006.
The meeting will be limited to about 200 participants
and precedence will be given to active predictors.
Some financial assistance will be available for the most successful predictors.
John Moult,
CARB, University of Maryland, USA
Krzysztof Fidelis,
University of California, Davis, USA
Tim Hubbard,
Welcome Trust Sanger Institute, Hinxton, UK
Andriy Kryshtafovych,
University of California, Davis, USA
Burkhard Rost,
Columbia University, New York, USA
Anna Tramontano,
University of Rome, Italy
General rules
- Server models should be returned automatically
to servers AT predictioncenter.org
following a query from CASP7 distribution server
- Predictions for CASP7 may be submitted in 7 separate formats:
TS # 3D atomic coordinates (Tertiary Structure) prediction
AL # Format to express unambiguous ALignments to PDB entries
RR # Residue-Residue separation distance prediction
DR # Order-Disorder Regions prediction
DP # Domain boundary prediction
FN # Function prediction
QA # Quality assessment
- One team may make a prediction of a target by submitting
up to five models in TS/AL, RR, DR, DP, FN and QA formats (models in AL
format are considered equivalent to those in TS
format and will be translated to TS internally before evaluation).
Most of the evaluation and assessment will focus on the model labeled '1'
(model index 1, see MODEL record).
- Each submission may contain only one of the seven format categories.
- Submission of each model begins with PFRMAT and ends with END record.
- Each submission may contain only one model, beginning with the MODEL record,
ending with END, and no target residue repetitions.
- Submission of a duplicate model (same target, format category, group, model
index) will replace previously accepted model, provided it is
received before the target has expired.
Note:
models in AL format are considered equivalent to those in TS format.
- Each submitted model is automatically verified by the format verification
server.
Only accepted models will be assigned an ACCESSION CODE.
A unique ACCESSION CODE is composed from the number of the target, prediction
format category, prediction group number, and model index.
Examples:
Accession code T0044TS005_2 has the following components:
T0044 target number
TS Tertiary Structure (PFRMAT TS)
005 prediction group 5
2 model index 2 (by default considered as FINAL/REFINED)
Accession code T0044TS005_2u has the following components:
T0044 target number
TS Tertiary Structure (PFRMAT TS)
005 prediction group 5
2u model index 2 UNREFINED set of coordinates
Format description
All submissions contain records described below.
Each of these records must begin with a standard keyword.
In all submissions standard keywords must
begin in the first column of a record.
The keyword set is as follows:
PFRMAT Format specification code: TS , AL , RR , DR, DP, FN, QA
TARGET Target identifier from the CASP7 target table
AUTHOR XXXX-XXXX-XXXX Registration code of the Group Leader
SCORE Reliability of the model (optional)
REMARK Comment record (may appear anywhere after the first 3 required lines, optional)
METHOD Records describing the methods used
MODEL Beginning of the data section for the submitted model
PARENT Specifies structure template used to generate the TS/AL model
TER Terminates independent segments of structure in the TS/AL model
END End of the submitted model
Models should be submitted in Plain Text format.
One model per submission is a rule for 'human groups' and recommendation
for servers. Servers can also reply to our request by a single email with
all models included one after the other. Only 5 first models will be considered.
PLEASE DON'T USE 'tab' AS A SEPARATOR. PLEASE USE 'space' INSTEAD.
Record PFRMAT should appear on the first line of the prediction and
is used for all submissions.
PFRMAT TS
TS indicates that submission contains 3D atomic coordinates
in standard PDB format
PFRMAT RR
RR indicates that submission contains residue-residue
separation distance prediction
PFRMAT AL
AL indicates that submission contains unambiguous alignments
to PDB entries
PFRMAT DR
DR indicates that submission contains order-disorder regions
prediction
PFRMAT DP
DP indicates that submission contains domain prediction
PFRMAT FN
FN indicates that submission contains function prediction
PFRMAT QA
QA indicates quality ranking submission
Record TARGET should appear on the second line of the prediction and
is used for all submissions.
TARGET Txxxx
Txxxx indicates id of the target predicted.
Record AUTHOR should appear on the third line of the prediction
and is used for all submissions.
For human groups:
AUTHOR XXXX-XXXX-XXXX
XXXX-XXXX-XXXX indicates the Group Leader's registration code.
This code is the prediction submission code obtained upon
CASP7 group registration.
Members of prediction groups who intend to submit predictions
should use the registration code of the Group Leader for all
predictions submitted by that group.
For server groups:
AUTHOR MY_GROUP_NAME
where MY_GROUP_NAME is a name selected for the group at registration.
Alternative way of identification for server groups:
REMARK AUTHOR MY_GROUP_NAME
SCORE Optional. This record may be used to report a model
reliability score. It will not influence the evaluation.
REMARK Optional. PDB style 'REMARK' records may be used
anywhere in the submission. These records may contain any
text and will in general not influence evaluation.
Records METHOD are used for all submissions.
These records describe the methods used. Predictors are urged to provide
as full a description of the methods as possible, including references,
data libraries used, and values of default and non-default parameters.
These descriptions will be made available via the Prediction Center WEB
pages as well as printed along with the other materials distributed at the
meeting. Length of 100 - 500 words is suggested.
Record MODEL is used for all submissions.
Signifies the beginning of model data (3D atomic coordinates, an unambiguous
alignment to a PDB entry, residue-residue separation distance prediction and
order-disorder region predictions).
MODEL n [REFINED|UNREFINED]
n Model index n is used to indicate predictor's ranking
according to her/his belief which model is closest to the
target structure (1 <= n <= 5). Model index is included
automatically in the ACCESSION CODE. All models with index
higher than 5 will be discarded.
REFINED The set of coordinates labeled REFINED will be considered
as a final model (to allow the evaluation of the results
of an automated refinement process, such as molecular
dynamics). Models submitted without any label: REFINED or
UNREFINED will be considered by default as final.
UNREFINED Coordinates labeled UNREFINED will be compared only to
the final set (REFINED) with the same model index n, to
evaluate the effectiveness of the refinement method. If
UNREFINED model is submitted, a REFINED model must be
submitted as well. The letter "u" will be added to the
model index in the ACCESSION CODE of the UNREFINED model.
Record PARENT is used for all submissions in the TS (and AL)
format.
PARENT record indicates structure templates used to generate any independent
segment of MODEL (see description of the TS format below).
The PARENT record should be placed as the first record of any such independent
segment. Only one PARENT record per structure segment is allowed.
PARENT N/A
Indicates an ab initio prediction, not directly based on any known
structure. Note that this is the only indication in the file that the
prediction is ab initio, so is a critical piece of information.
PARENT NONE [n1 n2]
Indicates that the predictor believes that there is no structure in
the present PDB that is close enough to be used as a template. This
is an entry requested by those predictors who use threading and
sequence comparison methods. With structural genomics projects being
designed to determine the structure of proteins with novel folds, the
ability to predict when a fold is unknown is becoming increasingly
important, and predictors are urged to make such submissions.
Delimiters n1 n2 indicate the range of the target sequence predicted
as having no homologue in the current PDB.
Omission of n1 n2 indicates the entire target (see Example 1 (C)).
PARENT mabc_A
Indicates that the model or the independent segment of structure is
based on a single PDB entry mabc chain A (use _A to indicate chain A).
Most threading and sequence search submissions would now be submitted
with this form of the PARENT record. A comparative modeler using a
single parent structure would also use this form. Note that, in order
to be accepted, the code must correspond to a current PDB entry.
PARENT mcdc ndef_g [ohij_k ...]
Is used only in comparative modeling and indicates that the model is
based on more than one structure template. Up to five PDB chains
may be listed here with additional detailed information included in
the METHOD records. In threading and sequence search, subdomains of
the target structure found to correspond to different known folds
should be submitted as independent segments of structure with
reference to only one PDB chain per segment.
Record TER is used to terminate an independent segment of structure
(PFRMAT TS and PFRMAT AL).
TER
3D atomic coordinates (PFRMAT TS).
Standard PDB atom records are used for the atomic coordinates. Format of the
submission requires that 80 column long records are used. These may be spaces
when needed (see target template PDB files as provided in specific target
descriptions available through the
CASP7 target table). This requirement is
necessitated by some of the software used in the evaluation of predictions.
Coordinates for each model or an independent structure segment should begin
with a single PARENT record and terminate with a TER record (see above).
It is requested that coordinate data be supplied for at least all
non-hydrogen main chain atoms, i.e. the N, CA, C and O atoms of every residue.
Specifically, if only CA atoms are predicted by the method, predictors are
encouraged to build the main chain atoms for every residue before submission
to CASP. One program that can make such a conversion is
Maxsprout server of Liisa Holm and co-workers. (If only CA atoms were submitted it would not be
possible to run most of the analysis software, which would severely limit the
evaluation of that prediction.)
When multiple independent segments of structure are used in a prediction,
they will be evaluated separately with no assumption of a common
frame of reference between the segments. For any given MODEL, no target
residue may be repeated among all such independent structure
segments. Potential multi-domain
nature of targets will be addressed in the evaluation even if the prediction
is made in a single frame of reference (i.e. without separation into multiple
segments of structure). For such predictions segmentation should only be
used to allow multiple model predictions (effectively up to 5 predictions
for each such domain).
Notes:
- atoms for which a prediction has been made must contain "1.0" in
the occupancy field; those for which no prediction is made must
either contain "0.0" in that field or be skipped altogether
- error estimates, in Angstroms, when given should be provided in the
temperature factor field
An unambiguous alignment to a PDB entry used for threading predictions
(PFRMAT AL).
Alignment for each model or an independent structure segment should begin
with a single PARENT record and terminate with a TER record (see above).
The (four column) alignment data records provide: target residue one
letter symbol, target residue sequence number, PDB residue one letter symbol,
and PDB residue sequence number with an insertion code if necessary
(see Example 3):
aa1 n1 aa2 n2
Note:
- residues for which no prediction is made must be skipped
- if a chain ID is specified in the PDB template of the target, then
the target residue sequence number should be composed of a chain ID
and residue number, e.g. A2, B44
The PDB code with chain extension of the structure the alignment is based on
should be placed in the PARENT record.
Only one PDB code per independent structure segment is allowed.
PDB codes should refer to structures containing at least the main chain atomic
coordinates (see the TS format).
As in the case of coordinate submissions,
when multiple independent segments of structure are used in a prediction,
they will be evaluated separately with no assumption of a common
frame of reference between the segments. For any given MODEL, no target
residue may be repeated among all such independent structure
segments. Potential multi-domain
nature of targets will be addressed in the evaluation even if the prediction
is made in a single frame of reference (i.e. without separation into multiple
segments of structure). For such predictions segmentation should only be
used to allow multiple model predictions (effectively up to 5 predictions
for each such domain).
Note:
The facility to translate sequence - structure alignments (AL format) into
standard PDB atom records (TS format) is available as an additional
AL2TS service.
Residue-Residue separation prediction (PFRMAT RR).
Data in this format are inserted between MODEL and END records of the
submission file.
Format for the predicted separation distance between pairs of residues.
The distance is defined as the separation between C-beta atoms (C-alpha for
glycine residues).
The (five column) RR format:
i j d1 d2 p
Notes (see Example 2):
- entire target sequence should be split over multiple lines with a
maximum of 50 residues per line
- for intrachain residue-residue contacts residue number indices
i and j should be used for distance specification (i < j), i.e.
only one diagonal of the separation matrix should be supplied
- the distances d1 and d2 (real numbers) should indicate the range of
Cb-Cb distance predicted for the residue pair (C-alpha for glycines)
- the real number p should range from 0.0 - 1.0 to indicate
probability of the distance falling between the predicted range
- residue 'contacts' (defined here - as in CASP2 - as Cb-Cb<8A) can be
predicted with this format as:
i j 0 8 p
- any pair NOT listed is assumed to be NOT considered by predictor
Order-disorder regions prediction (PFRMAT DR).
Data in this format are inserted between MODEL and END
records of the submission file.
The (three column) format record consists of residue code, Order/Disorder
prediction code, and a number specifying the associated confidence level:
aa OD p
The symbols for
the 2 state order/disorder prediction are
'O'=order, 'D'=disorder.
Last column should indicate a probability of a residue being in the
disordered region. The value of this confidence level is in the
range of 0.0 - 1.0. The entire sequence of the target should
always be given. If parts cannot be predicted a probability of 0.5 should be
used (see Example 5).
Domain boundary prediction (PFRMAT DP).
Data in this format are inserted between MODEL and END
records of the submission file.
You may also specify PARENT field (optional) if you used homologues
in assigning domains. It should be only one parent field per model and
order of parents should correspond to the domain number, i.e. parent listed
first corresponds to the domain assigned by you as number 1 and so on.
The format record consists of consecutive residue number n ,
residue code aa, domain number D and reliability score p
(a real number between 0 (unreliable) and 1 (sure), optional).
n aa D p
The domain numbers are Arabic numerals going from 1
(for the first domain) to N for the N-th domain
(which allows split domains to be easily coded).
Put a dash '-' instead of a domain number if you cannot predict
the domain for a particular residue.
(see Example 6).
Protein function prediction (PFRMAT FN).
Data are inserted between MODEL and END records of the submission file
(see Example 7 at the bottom of the page).
The data consist of four lines, each line starting with one of the
following keywords
GO Molecular Function:
EC number:
Binding site:
Prediction techniques:
and any additional number of lines starting with the keyword
Comment:
The format for each of the lines is described below (angle brackets designate
optional/additional data and should not be included into the prediction; semicolon
separates several entries on one row, e.g. different GO functions or
different binding sites, etc. ; comma separates entries within the same logical block,
e.g. numbers of residues within the same binding site or numbers of residues related
to GO category):
GO Molecular Function: N1 <, res1 - res2 ; N2 , res3-res4 ; ...>
** Ni is a Genome Ontology identifier; resi (i=1, 2, ...) is a residue number in the target
EC number: N1.N2.N3.N4 <, res1-res2 ; M1.M2.M3.M4 , res3-res4 ; ...>
where Ni (Mi), i=1-4, are integer numbers from the
Enzyme Nomenclature Table
** Non enzymes should be labeled as 0.0.0.0
Binding site: res1, res2, ...
  or
Binding site: res1 - res2, <res3 - res4>, ...
** Residues considered as binding sites are those in direct contact with
heteroatoms bound in the structure of the target protein. For the purposes of binding site residues predictors should be aiming to predict residues that have any atom in contact with the ligand at a distance of 0.5A plus the van der Waals radii.
For example under this defintion the vast majority of single magnesium atoms are in contact with 2-4 residues per chain and ATP is usually bound by 11-18 residues per chain.
Over-prediction of binding residues will not be advantageous.
Prediction techniques: N1.N2.N3.N4.N5.N6
where Ni (i=1-6) is either 0 (for "not used") or 1 (for "used") in a vector
of six numbers (e.g. 1.1.0.0.1.0) corresponding to:
N1 Sequence analysis
N2 Feature based predictions (e.g. sequence composition, postranslational
modifications, etc.)
N3 Predictions based on structural information
N4 Text mining and information extraction
N5 GO database (used in any way other than for deducing the numbers for submission)
N6 Manual annotation
Comment: free text
** The predictors are encouraged to use this section to include the description of their predictions (eg EC name, GO definition). Although this section will not be evaluated it might be useful in the case of any changes in GO codes and it will provide useful information for the next function evaluation.
Quality assessment prediction (PFRMAT QA).
Data are inserted between MODEL and END records of the submission file.
You may submit your quality assessment prediction in one of the two different modes:
QMODE 1 : global model quality score (MQS - one number for a model)
QMODE 2 : MQS and error estimate on per-residue basis.
The first line of data should specify mode identifier, i.e. QMODE (see Example 8).
In both modes, the first column in each line contains model identifier (file name of the
accepted 3D prediction).
The second column contains reliability score for a model as a whole.
The reliability score is a real number between 0.0 and 1.0 (1.0 being a perfect model).
If you don't provide MQS for a model please put "X" in the corresponding place.
If you don't want to additionally provide error estimates on per residue basis
(QMODE 1), your data table will consist of these two columns only.
If you do additionally provide residue error estimates (QMODE 2),
each consecutive column should contain error estimate in Angstroms for all the
consecutive resides in the target (i.e., column 3 corresponds to residue 1 in
the target, column 4 - to residue 2 and so on). This way data constitute a table
(Number_of_models_for_the_target) BY (Number_of_residues_in_the_target + 1).
Do not skip columns if you are not predicting error estimates for some residues -
instead put "X" in the corresponding column.
Please specify in the REMARKS what you consider to be an error estimate for a residue
(CA location error, geometrical center error, etc.).
Note. Please, be advised that a QA record line may be very long and then some
editors/mailing programs may force line wrap potentially causing unexpected parsing errors.
To avoid this problem we recommend that you split long lines into shorter sublines
(50-100 columns of data) by yourself. Our parser will consider consecutive sublines
(starting with the line containing evaluated model name and ending with the line
containing the next model name or tag END) a part of the same logical line.
END record is used for all predictions and indicates the end of a
single model submission.
Predictions of multichain targets.
Atomic coordinates should contain chain IDs as provided in template files.
In residue-residue contact predictions residue
indices should be composed of chain ID and residue number, e.g. A2, B44
(see Example 4B).
Example 1. Atomic coordinates (Tertiary Structure)
The primary CASP7 format used for comparative modeling, threading and ab initio
submission categories.
(A) An example of comparative modeling prediction. As this model is labeled
UNREFINED, submission of a
REFINED model is also required.
PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1 UNREFINED
PARENT 1abc 1def_A
ATOM 1 N GLU 1 10.982 -9.774 1.377 1.00 0.50
ATOM 2 CA GLU 1 9.623 -9.833 1.984 1.00 0.50
ATOM 3 C GLU 1 8.913 -11.104 1.521 1.00 0.50
ATOM 4 O GLU 1 9.187 -11.630 0.461 1.00 0.50
ATOM 5 CB GLU 1 8.814 -8.614 1.546 1.00 0.50
ATOM 6 CG GLU 1 7.372 -8.754 2.039 1.00 0.50
ATOM 7 CD GLU 1 7.339 -8.625 3.562 1.00 0.50
ATOM 8 OE1 GLU 1 8.370 -8.307 4.131 1.00 0.50
ATOM 9 OE2 GLU 1 6.284 -8.846 4.132 1.00 0.50
ATOM 10 N THR 2 7.998 -11.599 2.304 1.00 1.60
ATOM 11 CA THR 2 7.266 -12.832 1.907 1.00 1.60
ATOM 12 C THR 2 6.096 -12.456 1.005 1.00 1.60
ATOM 13 O THR 2 5.008 -12.217 1.466 1.00 1.60
ATOM 14 CB THR 2 6.731 -13.533 3.157 1.00 1.60
ATOM 15 OG1 THR 2 7.662 -13.379 4.220 1.00 1.60
ATOM 16 CG2 THR 2 6.526 -15.019 2.864 1.00 1.60
ATOM 17 N VAL 3 6.308 -12.396 -0.278 1.00 1.70
ATOM 18 CA VAL 3 5.190 -12.030 -1.187 1.00 1.70
ATOM 19 C VAL 3 3.954 -12.870 -0.844 1.00 1.70
ATOM 20 O VAL 3 2.834 -12.471 -1.090 1.00 1.70
ATOM 21 CB VAL 3 5.608 -12.274 -2.641 1.00 1.70
ATOM 22 CG1 VAL 3 5.542 -13.771 -2.959 1.00 1.70
ATOM 23 CG2 VAL 3 4.664 -11.514 -3.573 1.00 1.70
ATOM 24 N GLU 4 4.146 -14.029 -0.272 1.00 1.70
ATOM 25 CA GLU 4 2.976 -14.882 0.086 1.00 1.60
ATOM 26 C GLU 4 2.153 -14.190 1.175 1.00 1.50
ATOM 27 O GLU 4 0.942 -14.141 1.109 1.00 1.40
ATOM 28 CB GLU 4 3.465 -16.238 0.597 1.00 1.30
ATOM 29 CG GLU 4 2.336 -17.264 0.479 1.00 1.20
ATOM 30 CD GLU 4 2.929 -18.671 0.391 1.00 1.10
ATOM 31 OE1 GLU 4 4.056 -18.846 0.823 1.00 1.00
ATOM 32 OE2 GLU 4 2.246 -19.551 -0.108 1.00 0.90
TER
END
(B) A model consisting of 2 independent structure segments (could be a target
modeled from two PDB domains, where relative orientation is unknown;
could be 2 fragments predicted by ab initio methods - ab initio example shown).
In a single MODEL no residue should appear twice among all such segments.
PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT N/A
ATOM 1 N GLU 1 10.982 -9.774 1.377 1.00 0.50
ATOM 2 CA GLU 1 9.623 -9.833 1.984 1.00 0.50
ATOM 3 C GLU 1 8.913 -11.104 1.521 1.00 0.50
ATOM 4 O GLU 1 9.187 -11.630 0.461 1.00 0.50
ATOM 5 CB GLU 1 8.814 -8.614 1.546 1.00 0.50
ATOM 6 CG GLU 1 7.372 -8.754 2.039 1.00 0.50
ATOM 7 CD GLU 1 7.339 -8.625 3.562 1.00 0.50
ATOM 8 OE1 GLU 1 8.370 -8.307 4.131 1.00 0.50
ATOM 9 OE2 GLU 1 6.284 -8.846 4.132 1.00 0.50
ATOM 10 N THR 2 7.998 -11.599 2.304 1.00 1.60
ATOM 11 CA THR 2 7.266 -12.832 1.907 1.00 1.60
ATOM 12 C THR 2 6.096 -12.456 1.005 1.00 1.60
ATOM 13 O THR 2 5.008 -12.217 1.466 1.00 1.60
ATOM 14 CB THR 2 6.731 -13.533 3.157 1.00 1.60
ATOM 15 OG1 THR 2 7.662 -13.379 4.220 1.00 1.60
ATOM 16 CG2 THR 2 6.526 -15.019 2.864 1.00 1.60
ATOM 24 N GLU 4 4.146 -14.029 -0.272 1.00 1.70
ATOM 25 CA GLU 4 2.976 -14.882 0.086 1.00 1.60
ATOM 26 C GLU 4 2.153 -14.190 1.175 1.00 1.50
ATOM 27 O GLU 4 0.942 -14.141 1.109 1.00 1.40
ATOM 28 CB GLU 4 3.465 -16.238 0.597 1.00 1.30
ATOM 29 CG GLU 4 2.336 -17.264 0.479 1.00 1.20
ATOM 30 CD GLU 4 2.929 -18.671 0.391 1.00 1.10
ATOM 31 OE1 GLU 4 4.056 -18.846 0.823 1.00 1.00
ATOM 32 OE2 GLU 4 2.246 -19.551 -0.108 1.00 0.90
TER
PARENT N/A
ATOM 17 N VAL 3 6.308 -12.396 -0.278 1.00 1.70
ATOM 18 CA VAL 3 5.190 -12.030 -1.187 1.00 1.70
ATOM 19 C VAL 3 3.954 -12.870 -0.844 1.00 1.70
ATOM 20 O VAL 3 2.834 -12.471 -1.090 1.00 1.70
ATOM 21 CB VAL 3 5.608 -12.274 -2.641 1.00 1.70
ATOM 22 CG1 VAL 3 5.542 -13.771 -2.959 1.00 1.70
ATOM 23 CG2 VAL 3 4.664 -11.514 -3.573 1.00 1.70
TER
END
(C) Threading/Fold Recognition prediction stating that target has no
homologue in the current PDB.
PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT NONE
TER
END
Example 2. Residue-Residue contact prediction
The flexibility offered by the new format allows algorithms
parameterized to predict any distance range to be used.
Below is an example of how to use the new residue-residue separation
distance format to submit a prediction of residue contacts defined as Cb-Cb
distances < 8 A.
PFRMAT RR
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
HLEGSIGILLKKHEIVFDGC # <- entire target sequence (up to 50
HDFGRTYIWQMSD # residues per line)
1 9 0 8 0.70
1 10 0 8 0.70 # <- indices of residues: i and j (integers),
1 12 0 8 0.60 # <- the range of Cb-Cb distance predicted
1 14 0 8 0.20 # for the residue pair: d1 and d2 (real),
1 15 0 8 0.10 # <- probability of the distance between
1 17 0 8 0.30 # Cb atoms being within the specified
1 19 0 8 0.50 # range: p (real)
2 8 0 8 0.90
3 7 0 8 0.70
3 12 0 8 0.40
3 14 0 8 0.70
3 15 0 8 0.30
4 6 0 8 0.90
7 14 0 8 0.30
9 14 0 8 0.50
END
Example 3. An alternative alignment format for Threading/Fold Recognition
predictions
Alignments will be converted into a 3D structures.
(A) Format to express unambiguous alignments to PDB entries
'mabc_A' and 'nefg'.
PFRMAT AL
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT mabc_A
M 21 V 11
P 22 D 12
N 23 A 12A
F 24 F 12B
A 25 L 13
P 32 D 22
N 33 A 23
F 34 F 24
A 35 L 25
TER
PARENT nefg
E 75 T 73
T 76 T 74
V 77 A 75
D 78 D 76
G 79 D 77
R 80 R 78
TER
END
(B) Format to express unambiguous alignments to PDB entry 'mabc_D'.
An example of how to use the AL format to submit a prediction of
the target with a chain name of 'A'.
PFRMAT AL
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT mabc_D
M A21 V 11
P A22 D 12
N A23 A 12A
F A24 F 12B
A A25 L 13
P A32 D 22
N A33 A 23
F A34 F 24
A A35 L 25
TER
END
Example 4. Predictions of multichain targets (chains A and B)
(A) An example of 3D atomic coordinates model prediction.
PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT N/A
ATOM 17 N VAL A 3 6.308 -12.396 -0.278 1.00 1.70
ATOM 18 CA VAL A 3 5.190 -12.030 -1.187 1.00 1.70
ATOM 19 C VAL A 3 3.954 -12.870 -0.844 1.00 1.70
ATOM 20 O VAL A 3 2.834 -12.471 -1.090 1.00 1.70
ATOM 21 CB VAL A 3 5.608 -12.274 -2.641 1.00 1.70
ATOM 22 CG1 VAL A 3 5.542 -13.771 -2.959 1.00 1.70
ATOM 23 CG2 VAL A 3 4.664 -11.514 -3.573 1.00 1.70
ATOM 24 N GLU A 4 4.146 -14.029 -0.272 1.00 1.70
ATOM 25 CA GLU A 4 2.976 -14.882 0.086 1.00 1.60
ATOM 26 C GLU A 4 2.153 -14.190 1.175 1.00 1.50
ATOM 27 O GLU A 4 0.942 -14.141 1.109 1.00 1.40
ATOM 28 CB GLU A 4 3.465 -16.238 0.597 1.00 1.30
ATOM 29 CG GLU A 4 2.336 -17.264 0.479 1.00 1.20
ATOM 30 CD GLU A 4 2.929 -18.671 0.391 1.00 1.10
ATOM 31 OE1 GLU A 4 4.056 -18.846 0.823 1.00 1.00
ATOM 32 OE2 GLU A 4 2.246 -19.551 -0.108 1.00 0.90
REMARK
REMARK NOTE: Predictor should NOT use TER separator between chains
REMARK if multichain independent segment of structure has to
REMARK be evaluated as a one fragment
REMARK
ATOM 1 N GLU B 1 10.982 -9.774 1.377 1.00 0.50
ATOM 2 CA GLU B 1 9.623 -9.833 1.984 1.00 0.50
ATOM 3 C GLU B 1 8.913 -11.104 1.521 1.00 0.50
ATOM 4 O GLU B 1 9.187 -11.630 0.461 1.00 0.50
ATOM 5 CB GLU B 1 8.814 -8.614 1.546 1.00 0.50
ATOM 6 CG GLU B 1 7.372 -8.754 2.039 1.00 0.50
ATOM 7 CD GLU B 1 7.339 -8.625 3.562 1.00 0.50
ATOM 8 OE1 GLU B 1 8.370 -8.307 4.131 1.00 0.50
ATOM 9 OE2 GLU B 1 6.284 -8.846 4.132 1.00 0.50
ATOM 10 N THR B 2 7.998 -11.599 2.304 1.00 1.60
ATOM 11 CA THR B 2 7.266 -12.832 1.907 1.00 1.60
ATOM 12 C THR B 2 6.096 -12.456 1.005 1.00 1.60
ATOM 13 O THR B 2 5.008 -12.217 1.466 1.00 1.60
ATOM 14 CB THR B 2 6.731 -13.533 3.157 1.00 1.60
ATOM 15 OG1 THR B 2 7.662 -13.379 4.220 1.00 1.60
ATOM 16 CG2 THR B 2 6.526 -15.019 2.864 1.00 1.60
TER
END
(B) An example of how to use the RR format to submit a prediction of
interchain (chains A and B) residue-residue contacts defined as Cb-Cb
distances < 8 A.
PFRMAT RR
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
HLEGSIGILLKKHEIVFDGC # <- entire target sequence (up to 50
HDFGRTYIWQMSD # residues per line)
A1 B9 0 8 0.70
A1 B10 0 8 0.70 # <- indices of residues: Ai and Bj,
A1 B12 0 8 0.60 # <- the range of Cb-Cb distance predicted
A1 B14 0 8 0.20 # for the residue pair: d1 and d2 (real),
A1 B15 0 8 0.10 # <- probability of the distance between
A1 B17 0 8 0.30 # Cb atoms being within the specified
A1 B19 0 8 0.50 # range: p (real)
A2 B8 0 8 0.90
A3 B7 0 8 0.70
A3 B12 0 8 0.40
A3 B14 0 8 0.70
A3 B15 0 8 0.30
A4 B6 0 8 0.90
A7 B14 0 8 0.30
A9 B14 0 8 0.50
END
Example 5. Order-disorder regions prediction
Example of order-disorder regions prediction.
PFRMAT DR
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
H D 0.70 # <- residue code,
L D 0.80 # <- order/disorder assignment code,
E D 0.80 # <- the number specifying the associated
G D 0.60 # confidence level: 0.5 - residue not predicted
S D 0.90 # >0.5 - disordered region
I O 0.50 # <0.5 - ordered region
G O 0.40
I O 0.40
L O 0.30
L O 0.50
K O 0.50
K O 0.30
H O 0.20
E O 0.20
I O 0.40
V O 0.45
F D 0.60
D D 0.90
G D 0.60
C D 0.80
END
Example 6. Domain boundary prediction
PFRMAT DP
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL 1
PARENT 1abc 1efg # optional; 1abc was used to assign domain #1, 1efg - #2
1 H 1 0.90
2 L 1 0.90
3 E 1 0.90
4 G 1 0.90
5 S 1 0.90
6 I 1 0.90
7 G 1 0.60
8 I - 0.60
9 L - 0.80
10 L 2 0.80
11 K 2 0.90
12 K 2 0.90
13 H 2 0.90
14 E 2 0.90
15 I 2 0.90
16 V 2 0.75
17 F - 0.60
18 D 1 0.90
19 G 1 0.90
20 C 1 0.90
END
Example 7. Function prediction
PFRMAT FN
TARGET T0283
AUTHOR 1111-2222-3333
REMARK Predictor remarks
METHOD Description of methods used
MODEL 1
GO Molecular Function: 00000049 ; 00005525
EC number: 3.6.5.3
Binding site: 50-54, 76-79 ; 81, 82, 93-95
Prediction techniques: 1.1.0.0.1.0
Comment: my comment
END
Example 8. Quality assessment prediction
(A) Global Model Quality Score
PFRMAT QA
TARGET T0283
AUTHOR 1111-2222-3333
METHOD Description of methods used
MODEL 1
QMODE 1
3D-JIGSAW_TS1 0.8
FORTE1_AL1.pdb 0.7
END
(B) Residue-based Quality Assessment (fragment of the table). Note, that this case includes case (A) and there is no need to submit QMODE 1 predictions additionlly to QMODE 2.
PFRMAT QA
TARGET T0283
AUTHOR 1111-2222-3333
REMARK Error estimate is CA-CA distance in Angstroms
METHOD Description of methods used
MODEL 1
QMODE 2
3D-JIGSAW_TS1 0.8 10.0 6.5 5.0 2.0 1.0 ...
FORTE1_AL1.pdb 0.7 8.0 5.5 4.5 X X ...
END