Introduction

CASP experiments aim at establishing the current state of the art in protein structure prediction, identifying what progress has been made, and highlighting where future effort may be most productively focused.

There have been four previous experiments, in 1994, 1996, 1998 and 2000. Full details of these may be found at the CASP web site, and in the special issues of the journal PROTEINS: 23(5), 1995; Suppl 1, 1997; Suppl 3, 1999 and Suppl 5, 2001. In CASP4, 163 prediction groups from all around the world took part.

We now announce the fifth experiment. As before, the goal is to obtain an in-depth and objective assessment of our current abilities and inabilities in this area. To this end, participants will predict as much as possible about a set of soon to be known structures. These will be true predictions, not ‘post-dictions’ made on already known structures.

The experiment will begin in May 2002, when the first prediction targets will be made available. The prediction season will run through August. There will be a meeting at the Asilomar Conference Center in California December 1 - 5^th, to discuss the results.

Scope of CASP5 and Related Experiments

As in earlier CASPs, all types of methods for predicting protein structure will be considered, ranging from comparative modeling through fold recognition and 'new fold' prediction. Most emphasis will be on tertiary structure prediction but secondary structure and contact prediction methods will also be included. A new category, prediction of disordered regions in proteins, will be introduced. In addition, there will be additional activities included in CASP5, which will extend the scope substantially:

1. CAFASP3: In the era of genome sequencing, rapid protein structure modeling methods have a critical role to play. This experiment, led by Dani Fischer, will evaluate automatic methods of predicting protein structure, using the CASP targets. All targets will be processed through prediction servers that register for the experiment. Models will be evaluated by the same numerical criteria as used for CASP5, as well as the CAFASP criteria, and a session at the CASP5 meeting will compare performance through the servers with that obtained when full human participation is allowed. Further details of this experiment may be found at: http://www.cs.bgu.ac.il/~dfischer/CAFASP3.

2. Large scaling benchmarking: It is hoped that the results of well run benchmarking experiments such as EVA and LIVEBENCH will also be included in the CASP meeting and publications.

3. Ten Most Wanted: Results of the community effort to produce useful models of ten proteins of particular biological or medical interest will also be discussed at the meeting.

Experiment

The broad goals of the CASP5 experiment are to address the following questions about the current state of the art in protein structure prediction:

Are the models produced similar to the corresponding experimental structure?
Is the mapping of the target sequence onto the proposed structure (i.e. the alignment) correct?
Have similar structures that a model can be based on been identified?
Are the details of the models correct?
Has there been progress from the earlier CASPs?
What methods are most effective?
Where can future effort be most productively focused?

In addition, CASP5 will focus particularly on areas of prediction that previous CASPs have shown to be current bottlenecks to progress. Suggested problem areas are:

Alignment of a sequence onto a template fold.
Model refinement - improving the accuracy of an initial model.
Accurately modeling regions of insertion and deletion relative to a template structure.
Improved fold recognition, particularly for analogous, analogous/new fold targets.
Improved New Fold methods.
Bold new methods aimed at removing one of the current bottlenecks to progress.

To facilitate progress in some of these areas, partly built models will be provided where necessary, for example, correct alignment as a starting point for loop building, and full approximate models as a starting point for refinement. The MD community is encouraged to participate, making use of these starting points.

The set of problem areas may be revised, following discussion with the consultancy groups. Participants are advised to check for revisions.

Prediction Targets

For the experiment to succeed, it is essential that we obtain the help of the experimental community. As in previous CASPs, we will invite protein crystallographers and NMR spectroscopists to provide details of structures they expect to have made public before 1st October 2002. Prediction targets will be made available through the web site. All targets will be assigned an expiry date, and predictions must be received and accepted before that expiration date.

Participation

Participation is open to all. Those interested in receiving mailings concerning progress of the experiment may also register as 'observers'. CAFASP predictors must register at that web site. Participation in both experiments (initial submission of server generated models to CAFASP and subsequent submission of models based on human interaction to CASP) is encouraged. Note that separate registrations are required for CASP and CAFASP participation.

Assessment of Predictions

As in previous CASPs, independent assessors will evaluate the predictions. There will be three assessors, representing expertise in the comparative modeling, fold recognition and new fold prediction areas. Assessors will be provided with the results of numerical evaluation of the predictions, and will judge the results primarily on that basis. They will be asked to focus particularly on the effectiveness of different methods.

As CASP has grown, the work of the assessors has become more and more demanding. To help them with their task, predictors and target submitters will be asked to assist in the evaluation of models.

Numerical evaluation criteria will as far as possible be very similar to those used in CASP4, although the assessors may be permitted to introduce some additional ones. Appropriate members of the prediction community will be asked to develop methods of assessing the statistical significance of performance ranking, a controversial area in the past.

Release of Results

1. All CASP predictions and evaluations will be made available through this web site, shortly before the meeting.

2. The proceedings of the meeting will be published. In recent CASPs, the large number of predictors together with the limited number of published predictor papers has been one of the main causes of excessive focus on winners and losers. To combat that, the mix of papers in the special issue will altered. There will not be a predefined number of 'winner' group papers. Instead, papers that focus on problem areas and solutions will be included.

3. In a further change to increase recognition of the contributions of more predictors, a new web site will be developed, allowing all participants to report their work if they wish, and encouraging vigorous discussion of the results.

Meeting

A meeting will be held 1-5 December, 2002 at Asilomar, California, USA to evaluate the results of the prediction experiment. The meeting will be limited to about 200 participants and precedence will be given to active predictors. It is hoped that some financial assistance will be available for the more successful predictors. It is expected that the format of the meeting will be changed from previous CASPs, concentrating more on progress in problem areas, rather than the best performers in each section, irrespective of whether there has been a significant change since last time.

Organizing Committee

John Moult, CARB, University of Maryland, USA, (jmoult@tunc.org)
Krzysztof Fidelis, Lawrence Livermore National Laboratory, USA, (fidelis@llnl.gov)
Adam Zemla, Lawrence Livermore National Laboratory, USA, (adamz@llnl.gov)
Tim Hubbard, Sanger Centre, Hinxton, UK, (th@sanger.ac.uk)

Formats for Prediction Submission

General rules

Predictions for CASP5 may be submitted in four separate formats:

   TS    # 3D atomic coordinates (Tertiary Structure) prediction
   AL    # Format to express unambiguous ALignments to PDB entries
   SS    # Secondary Structure prediction
   RR    # Residue-Residue separation distance prediction
   DR    # Order-Disorder Regions prediction

Note: DR is a new prediction category that has been introduced for CASP5 experiment.

One team may make a prediction of a target by submitting up to five models in TS/AL, SS, RR, and DR formats (models in AL format are considered equivalent to those in TS format and will be translated to TS internally before evaluation). Most of the evaluation and assessment will focus on the model labeled '1' (model index 1, see MODEL record).
Each submission may contain only one of the four format categories.
Submission of each model begins with PFRMAT and ends with END record.
Each submission may contain only one model, beginning with the MODEL record, ending with END, and no target residue repetitions.
Submission of a duplicate model (same target, format category, group, model index) will replace previously accepted model, provided it is received before the target has expired.
Note: models in AL format are considered equivalent to those in TS format.

Each submitted model is automatically verified by the format verification server. Only accepted models will be assigned an ACCESSION CODE. A unique ACCESSION CODE is composed from the number of the target, prediction format category, prediction group number, and model index.

   Examples:

   Accession code  T0045SS067_1  has the following components:
     T0045   target number
     SS      Secondary Structure prediction (PFRMAT SS)
     067     prediction group 67
     1       model index 1 (see MODEL record)

   Accession code  T0044TS005_2  has the following components:
     T0044   target number
     TS      Tertiary Structure (3D atoms coordinates, PFRMAT TS)
     005     prediction group 5
     2       model index 2 (by default considered as FINAL/REFINED)

   Accession code  T0044TS005_2u  has the following components:
     T0044   target number
     TS      Tertiary Structure (3D atoms coordinates, PFRMAT TS)
     005     prediction group 5
     2u      model index 2 UNREFINED set of coordinates

Format description

All submissions contain records described below. Each of these records must begin with a standard keyword. In all submissions standard keywords must begin in the first column of a record. The keyword set is as follows:

PFRMAT     Format specification code:  TS , SS , RR , DR or AL
TARGET     Target identifier from the CASP5 target table
AUTHOR     XXXX-XXXX-XXXX   Registration code of the Group Leader
REMARK     Comment record (may appear anywhere, optional)
METHOD     Records describing the methods used
MODEL      Beginning of the data section for the submitted model
PARENT     Specifies structure template used to generate the TS/AL model 
TER        Terminates independent segments of structure in the TS/AL model
END        End of the submitted model

Models should be submitted in Plain Text format.
PLEASE DON'T USE 'tab' AS A SEPARATOR. PLEASE USE 'space' INSTEAD.

Record PFRMAT is used for all submissions.

   PFRMAT TS
     TS  indicates that submission contains 3D atomic coordinates
         in standard PDB format

   PFRMAT SS
     SS  indicates that submission contains secondary structure
         prediction

   PFRMAT RR
     RR  indicates that submission contains residue-residue 
         separation distance prediction

   PFRMAT AL
     AL  indicates that submission contains unambiguous alignments
         to PDB entries

   PFRMAT DR
     DR  indicates that submission contains order-disorder regions
         prediction

Record TARGET is used for all submissions.

   TARGET Txxxx
     Txxxx indicates id of the target predicted.
          Targets from the CASP5 Target list are valid.

Note: for some targets residue numbering may be non-standard. Please check residue numbering in "Template PDB file" provided for each Txxxx target from the CASP5 "Target list".

Record AUTHOR is used for all submissions.

   AUTHOR XXXX-XXXX-XXXX
     XXXX-XXXX-XXXX indicates the Group Leader's registration code.
          This code is the prediction submission code obtained upon
          registration at the CASP5 WEB site (Prediction Center).
          Members of prediction groups who intend to submit predictions
          should use the registration code of the Group Leader for all
          predictions submitted by that group.

REMARK Optional. PDB style 'REMARK' records may be used anywhere in the submission. These records may contain any text and will in general not influence evaluation.

Records METHOD are used for all submissions.
These records describe the methods used. Predictors are urged to provide as full a description of the methods as possible, including references, data libraries used, and values of default and non-default parameters. These descriptions will be made available via the Prediction Center WEB pages as well as printed along with the other materials distributed at the meeting. Length of 100 - 500 words is suggested.

Record MODEL is used for all submissions.
Signifies the beginning of model data (3D atomic coordinates, an unambiguous alignment to a PDB entry, residue-residue separation distance prediction, or secondary structure prediction).

   MODEL  n  [REFINED|UNREFINED]
     n          Model index n is used to indicate predictor's ranking
                according to her/his belief which model is closest to the 
                target structure (1 <= n <= 5). Model index is included
                automatically in the ACCESSION CODE.
     REFINED    The set of coordinates labeled REFINED will be considered
                as a final model (to allow the evaluation of the results
                of an automated refinement process, such as molecular 
                dynamics). Models submitted without any label: REFINED or 
                UNREFINED will be considered by default as final.
     UNREFINED  Coordinates labeled UNREFINED will be compared only to 
                the final set (REFINED) with the same model index n, to 
                evaluate the effectiveness of the refinement method. If 
                UNREFINED model is submitted, a REFINED model must be 
                submitted as well. The letter "u" will be added to the 
                model index in the ACCESSION CODE of the UNREFINED model.

Record PARENT is used for all submissions in the TS (and AL) format.
PARENT record indicates structure templates used to generate any independent segment of MODEL (see description of the TS format below). The PARENT record should be placed as the first record of any such independent segment. Only one PARENT record per structure segment is allowed.

   PARENT N/A
     Indicates an ab initio prediction, not directly based on any known
     structure. Note that this is the only indication in the file that the
     prediction is ab initio, so is a critical piece of information.

   PARENT NONE [n1 n2]
     Indicates that the predictor believes that there is no structure in
     the present PDB that is close enough to be used as a template. This
     is an entry requested by those predictors who use threading and
     sequence comparison methods. With structural genomics projects being
     designed to determine the structure of proteins with novel folds, the
     ability to predict when a fold is unknown is becoming increasingly
     important, and predictors are urged to make such submissions.
     Delimiters n1 n2 indicate the range of the target sequence predicted
     as having no homologue in the current PDB.
     Omission of n1 n2 indicates the entire target (see Example 1 (C)).

   PARENT mabc_A
     Indicates that the model or the independent segment of structure is
     based on a single PDB entry mabc chain A (use _A to indicate chain A).
     Most threading and sequence search submissions would now be submitted
     with this form of the PARENT record. A comparative modeler using a
     single parent structure would also use this form. Note that, in order
     to be accepted, the code must correspond to a current PDB entry.

   PARENT mcdc ndef_g [ohij_k ...]
     Is used only in comparative modeling and indicates that the model is
     based on more than one structure template. Up to five PDB chains
     may be listed here with additional detailed information included in
     the METHOD records. In threading and sequence search, subdomains of
     the target structure found to correspond to different known folds
     should be submitted as independent segments of structure with
     reference to only one PDB chain per segment.

Record TER is used to terminate an independent segment of structure (PFRMAT TS and PFRMAT AL).

TER

3D atomic coordinates (PFRMAT TS).
Standard PDB atom records are used for the atomic coordinates. Format of the submission requires that 80 column long records are used. These may be spaces when needed (see target template PDB files as provided in specific target descriptions available through the CASP5 target table). This requirement is necessitated by some of the software used in the evaluation of predictions.

Coordinates for each model or an independent structure segment should begin with a single PARENT record and terminate with a TER record (see above).

It is requested that coordinate data be supplied for at least all non-hydrogen main chain atoms, i.e. the N, CA, C and O atoms of every residue. Specifically, if only CA atoms are predicted by the method, predictors are encouraged to build the main chain atoms for every residue before submission to CASP. One program that can make such a conversion is Maxsprout server of Liisa Holm and co-workers. (If only CA atoms were submitted it would not be possible to run most of the analysis software, which would severely limit the evaluation of that prediction.) When multiple independent segments of structure are used in a prediction, they will be evaluated separately with no assumption of a common frame of reference between the segments. For any given MODEL, no target residue may be repeated among all such independent structure segments. Potential multi-domain nature of targets will be addressed in the evaluation even if the prediction is made in a single frame of reference (i.e. without separation into multiple segments of structure). For such predictions segmentation should only be used to allow multiple model predictions (effectively up to 5 predictions for each such domain).

   Notes:
     - atoms for which a prediction has been made must contain "1.0" in
       the occupancy field; those for which no prediction is made must
       either contain "0.0" in that field or be skipped altogether
     - error estimates, in Angstroms, when given should be provided in the 
       temperature factor field

An unambiguous alignment to a PDB entry used for threading predictions (PFRMAT AL).
Alignment for each model or an independent structure segment should begin with a single PARENT record and terminate with a TER record (see above). The (four column) alignment data records provide: target residue one letter symbol, target residue sequence number, PDB residue one letter symbol, and PDB residue sequence number with an insertion code if necessary (see Example 4):

   aa1 n1  aa2 n2

   Note:
     - residues for which no prediction is made must be skipped
     - if a chain ID is specified in the PDB template of the target, then 
       the target residue sequence number should be composed of a chain ID 
       and residue number, e.g. A2, B44

The PDB code with chain extension of the structure the alignment is based on should be placed in the PARENT record. Only one PDB code per independent structure segment is allowed. PDB codes should refer to structures containing at least the main chain atomic coordinates (see the TS format). As in the case of coordinate submissions, when multiple independent segments of structure are used in a prediction, they will be evaluated separately with no assumption of a common frame of reference between the segments. For any given MODEL, no target residue may be repeated among all such independent structure segments. Potential multi-domain nature of targets will be addressed in the evaluation even if the prediction is made in a single frame of reference (i.e. without separation into multiple segments of structure). For such predictions segmentation should only be used to allow multiple model predictions (effectively up to 5 predictions for each such domain).
Note: The facility to translate sequence - structure alignments (AL format) into standard PDB atom records (TS format) is available as an additional AL2TS service.

Secondary structure prediction (PFRMAT SS).
Data in this format is inserted between MODEL and END records of the submission file.
The (three column) format record consists of residue code, secondary structure assignment code, and a number specifying the associated confidence level:

   aa  ss  p

The symbols for the 3 state secondary structure are 'H'=helix, 'E'=strand, 'C'=Coil. Confidence level is a probability of a residue being predicted correctly with values in the range of 0.0 - 1.0. The entire sequence of the target should always be given. If parts cannot be predicted a probability of 0.0 should be used.

Order-disorder regions prediction (PFRMAT DR).
Data in this format is inserted between MODEL and END records of the submission file.
The (three column) format record consists of residue code, Order/Disorder prediction code, and a number specifying the associated confidence level:

   aa  OD  p

The symbols for the 2 state order/disorder prediction are 'O'=order, 'D'=disorder. Last column should indicate a probability of a residue being in the disordered region. The value of this confidence level is in the range of 0.0 - 1.0. The entire sequence of the target should always be given. If parts cannot be predicted a probability of 0.5 should be used (see Example 6).

Residue-Residue separation prediction (PFRMAT RR).
Data in this format is inserted between MODEL and END records of the submission file.
Format for the predicted separation distance between pairs of residues. The distance is defined as the separation between C-beta atoms (C-alpha for glycine residues).

It is STRONGLY recommended that the full flexibility of the format is not used, to allow a simple and uniform evaluation.
Thus values of d1 = 0 and d2 = 8 are recommended. If it is planned to submit using other distance ranges (d1,d2) we request that a corresponding prediction with only the (0,8) ranges is submitted as model "1", and the original prediction as model "2" with the appropriate explanation in the REMARK field regarding the relation to model "1".

The (five column) RR format:

   i  j  d1  d2  p

   Notes (see Example 3):
     - entire target sequence should be split over multiple lines with a
       maximum of 50 residues per line
     - for intrachain residue-residue contacts residue number indices 
       i and j should be used for distance specification (i < j), i.e. 
       only one diagonal of the separation matrix should be supplied
     - the distances d1 and d2 (real numbers) should indicate the range of 
       Cb-Cb distance predicted for the residue pair (C-alpha for glycines)
     - the real number p should range from 0.0 - 1.0 to indicate
       probability of the distance falling between the predicted range
     - residue 'contacts' (defined here - as in CASP2 - as Cb-Cb<8A) can be 
       predicted with this format as:
         i  j  0  8  p
     - any pair NOT listed is assumed to be NOT considered by predictor
     - to evaluate the subset of residue-residue separation distances that
       represent 'contacts', 4 separation interval bins will be used (as
       in CASP2) (separation is calculated along the chain as a number of
       residues between the residues in contact):
         1 residue or more     : 1-9999
         from 1 to 4 residues  : 1-4
         from 5 to 8 residues  : 5-8
         9 residues or more    : 9-9999
       Example: Let's assume that two residues are in contact and there
       are 6 residues in between them along the chain. This contact will
       be classified as belonging to 5-8 separation interval bin and will
       not be counted in 1-4 and 9-9999 bins
     - in addition, in the evaluation of each prediction, 'p' will be
       compared to what would be expected from random, i.e. the likelyhood
       observed in the database of protein structures for a pair of
       residues with residue separation (distance) d1-d2; residue
       separation (sequence) j-i; protein size; types of residue i, j.

END record is used for all predictions and indicates the end of a single model submission.

Predictions of multichain targets.
Atomic coordinates should contain chain IDs as provided in template files. In residue-residue contact predictions residue indices should be composed of chain ID and residue number, e.g. A2, B44 (see Example 5).

Example 1. Atomic coordinates (Tertiary Structure)

The primary CASP5 format used for comparative modeling, threading and ab initio submission categories.

(A) An example of comparative modeling prediction. As this model is labeled UNREFINED, submission of a REFINED model is also required.

PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1  UNREFINED
PARENT 1abc 1def_A
ATOM      1  N   GLU     1      10.982  -9.774   1.377  1.00  0.50
ATOM      2  CA  GLU     1       9.623  -9.833   1.984  1.00  0.50
ATOM      3  C   GLU     1       8.913 -11.104   1.521  1.00  0.50
ATOM      4  O   GLU     1       9.187 -11.630   0.461  1.00  0.50
ATOM      5  CB  GLU     1       8.814  -8.614   1.546  1.00  0.50
ATOM      6  CG  GLU     1       7.372  -8.754   2.039  1.00  0.50
ATOM      7  CD  GLU     1       7.339  -8.625   3.562  1.00  0.50
ATOM      8  OE1 GLU     1       8.370  -8.307   4.131  1.00  0.50
ATOM      9  OE2 GLU     1       6.284  -8.846   4.132  1.00  0.50
ATOM     10  N   THR     2       7.998 -11.599   2.304  1.00  1.60
ATOM     11  CA  THR     2       7.266 -12.832   1.907  1.00  1.60
ATOM     12  C   THR     2       6.096 -12.456   1.005  1.00  1.60
ATOM     13  O   THR     2       5.008 -12.217   1.466  1.00  1.60
ATOM     14  CB  THR     2       6.731 -13.533   3.157  1.00  1.60
ATOM     15  OG1 THR     2       7.662 -13.379   4.220  1.00  1.60
ATOM     16  CG2 THR     2       6.526 -15.019   2.864  1.00  1.60
ATOM     17  N   VAL     3       6.308 -12.396  -0.278  1.00  1.70
ATOM     18  CA  VAL     3       5.190 -12.030  -1.187  1.00  1.70
ATOM     19  C   VAL     3       3.954 -12.870  -0.844  1.00  1.70
ATOM     20  O   VAL     3       2.834 -12.471  -1.090  1.00  1.70
ATOM     21  CB  VAL     3       5.608 -12.274  -2.641  1.00  1.70
ATOM     22  CG1 VAL     3       5.542 -13.771  -2.959  1.00  1.70
ATOM     23  CG2 VAL     3       4.664 -11.514  -3.573  1.00  1.70
ATOM     24  N   GLU     4       4.146 -14.029  -0.272  1.00  1.70
ATOM     25  CA  GLU     4       2.976 -14.882   0.086  1.00  1.60
ATOM     26  C   GLU     4       2.153 -14.190   1.175  1.00  1.50
ATOM     27  O   GLU     4       0.942 -14.141   1.109  1.00  1.40
ATOM     28  CB  GLU     4       3.465 -16.238   0.597  1.00  1.30
ATOM     29  CG  GLU     4       2.336 -17.264   0.479  1.00  1.20
ATOM     30  CD  GLU     4       2.929 -18.671   0.391  1.00  1.10
ATOM     31  OE1 GLU     4       4.056 -18.846   0.823  1.00  1.00
ATOM     32  OE2 GLU     4       2.246 -19.551  -0.108  1.00  0.90
TER
END

(B) A model consisting of 2 independent structure segments (could be a target modeled from two PDB domains, where relative orientation is unknown; could be 2 fragments predicted by ab initio methods - ab initio example shown). In a single MODEL no residue should appear twice among all such segments.

PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
PARENT N/A
ATOM      1  N   GLU     1      10.982  -9.774   1.377  1.00  0.50
ATOM      2  CA  GLU     1       9.623  -9.833   1.984  1.00  0.50
ATOM      3  C   GLU     1       8.913 -11.104   1.521  1.00  0.50
ATOM      4  O   GLU     1       9.187 -11.630   0.461  1.00  0.50
ATOM      5  CB  GLU     1       8.814  -8.614   1.546  1.00  0.50
ATOM      6  CG  GLU     1       7.372  -8.754   2.039  1.00  0.50
ATOM      7  CD  GLU     1       7.339  -8.625   3.562  1.00  0.50
ATOM      8  OE1 GLU     1       8.370  -8.307   4.131  1.00  0.50
ATOM      9  OE2 GLU     1       6.284  -8.846   4.132  1.00  0.50
ATOM     10  N   THR     2       7.998 -11.599   2.304  1.00  1.60
ATOM     11  CA  THR     2       7.266 -12.832   1.907  1.00  1.60
ATOM     12  C   THR     2       6.096 -12.456   1.005  1.00  1.60
ATOM     13  O   THR     2       5.008 -12.217   1.466  1.00  1.60
ATOM     14  CB  THR     2       6.731 -13.533   3.157  1.00  1.60
ATOM     15  OG1 THR     2       7.662 -13.379   4.220  1.00  1.60
ATOM     16  CG2 THR     2       6.526 -15.019   2.864  1.00  1.60
ATOM     24  N   GLU     4       4.146 -14.029  -0.272  1.00  1.70
ATOM     25  CA  GLU     4       2.976 -14.882   0.086  1.00  1.60
ATOM     26  C   GLU     4       2.153 -14.190   1.175  1.00  1.50
ATOM     27  O   GLU     4       0.942 -14.141   1.109  1.00  1.40
ATOM     28  CB  GLU     4       3.465 -16.238   0.597  1.00  1.30
ATOM     29  CG  GLU     4       2.336 -17.264   0.479  1.00  1.20
ATOM     30  CD  GLU     4       2.929 -18.671   0.391  1.00  1.10
ATOM     31  OE1 GLU     4       4.056 -18.846   0.823  1.00  1.00
ATOM     32  OE2 GLU     4       2.246 -19.551  -0.108  1.00  0.90
TER
PARENT N/A
ATOM     17  N   VAL     3       6.308 -12.396  -0.278  1.00  1.70
ATOM     18  CA  VAL     3       5.190 -12.030  -1.187  1.00  1.70
ATOM     19  C   VAL     3       3.954 -12.870  -0.844  1.00  1.70
ATOM     20  O   VAL     3       2.834 -12.471  -1.090  1.00  1.70
ATOM     21  CB  VAL     3       5.608 -12.274  -2.641  1.00  1.70
ATOM     22  CG1 VAL     3       5.542 -13.771  -2.959  1.00  1.70
ATOM     23  CG2 VAL     3       4.664 -11.514  -3.573  1.00  1.70
TER
END

PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
PARENT NONE
TER
END

Example 2. Secondary structure prediction

Example of secondary structure prediction.

Note to predictors: it may be interesting to predict the secondary structure of proteins even when a clear structural homologue is known for the target. In cases where the target sequence is divergent from the template, secondary structure prediction may be more accurate than that implied by the template and visa versa.

PFRMAT SS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
H E 0.70           # <- residue code, 
L E 0.80           # <- secondary structure assignment code, 
E E 0.80           # <- the number specifying the associated 
G E 0.60           #    confidence level
S C 0.90
I E 0.50
G E 0.40
I E 0.60
L E 0.70
L C 0.50
K C 0.50
K H 0.90
H H 0.90
E H 0.90
I H 0.80
V H 0.70
F C 0.90
D C 0.90
G H 0.40
C C 0.40
END

Example 3. Residue-Residue contact prediction

The flexibility offered by the new format allows algorithms parameterized to predict any distance range to be used. Below is an example of how to use the new residue-residue separation distance format to submit a prediction of residue contacts defined as Cb-Cb distances < 8 A.

PFRMAT RR
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
HLEGSIGILLKKHEIVFDGC       # <- entire target sequence (up to 50 
HDFGRTYIWQMSD              #    residues per line)
1  9  0  8  0.70        
1 10  0  8  0.70           # <- indices of residues: i and j (integers), 
1 12  0  8  0.60           # <- the range of Cb-Cb distance predicted
1 14  0  8  0.20           #    for the residue pair: d1 and d2 (real),
1 15  0  8  0.10           # <- probability of the distance between 
1 17  0  8  0.30           #    Cb atoms being within the specified
1 19  0  8  0.50           #    range: p (real)
2  8  0  8  0.90
3  7  0  8  0.70
3 12  0  8  0.40
3 14  0  8  0.70
3 15  0  8  0.30
4  6  0  8  0.90
7 14  0  8  0.30
9 14  0  8  0.50
END

Example 4. An alternative alignment format for Threading/Fold Recognition predictions

Alignments will be converted into a 3D structures.

(A) Format to express unambiguous alignments to PDB entries 'mabc_A' and 'nefg'.

PFRMAT AL
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
PARENT mabc_A
M  21    V  11 
P  22    D  12  
N  23    A  12A 
F  24    F  12B 
A  25    L  13  
P  32    D  22  
N  33    A  23 
F  34    F  24 
A  35    L  25  
TER
PARENT nefg
E  75    T  73   
T  76    T  74   
V  77    A  75  
D  78    D  76  
G  79    D  77  
R  80    R  78  
TER
END

(B) Format to express unambiguous alignments to PDB entry 'mabc_D'. An example of how to use the AL format to submit a prediction of the target with a chain name of 'A'.

PFRMAT AL
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
PARENT mabc_D
M  A21    V  11 
P  A22    D  12  
N  A23    A  12A 
F  A24    F  12B 
A  A25    L  13  
P  A32    D  22  
N  A33    A  23 
F  A34    F  24 
A  A35    L  25  
TER
END

Example 5. Predictions of multichain targets (chains A and B)

(A) An example of 3D atomic coordinates model prediction.

PFRMAT TS
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1 
PARENT N/A
ATOM     17  N   VAL A   3       6.308 -12.396  -0.278  1.00  1.70
ATOM     18  CA  VAL A   3       5.190 -12.030  -1.187  1.00  1.70
ATOM     19  C   VAL A   3       3.954 -12.870  -0.844  1.00  1.70
ATOM     20  O   VAL A   3       2.834 -12.471  -1.090  1.00  1.70
ATOM     21  CB  VAL A   3       5.608 -12.274  -2.641  1.00  1.70
ATOM     22  CG1 VAL A   3       5.542 -13.771  -2.959  1.00  1.70
ATOM     23  CG2 VAL A   3       4.664 -11.514  -3.573  1.00  1.70
ATOM     24  N   GLU A   4       4.146 -14.029  -0.272  1.00  1.70
ATOM     25  CA  GLU A   4       2.976 -14.882   0.086  1.00  1.60
ATOM     26  C   GLU A   4       2.153 -14.190   1.175  1.00  1.50
ATOM     27  O   GLU A   4       0.942 -14.141   1.109  1.00  1.40
ATOM     28  CB  GLU A   4       3.465 -16.238   0.597  1.00  1.30
ATOM     29  CG  GLU A   4       2.336 -17.264   0.479  1.00  1.20
ATOM     30  CD  GLU A   4       2.929 -18.671   0.391  1.00  1.10
ATOM     31  OE1 GLU A   4       4.056 -18.846   0.823  1.00  1.00
ATOM     32  OE2 GLU A   4       2.246 -19.551  -0.108  1.00  0.90
REMARK 
REMARK  NOTE: Predictor should NOT use TER separator between chains 
REMARK        if multichain independent segment of structure has to 
REMARK        be evaluated as a one fragment
REMARK
ATOM      1  N   GLU B   1      10.982  -9.774   1.377  1.00  0.50
ATOM      2  CA  GLU B   1       9.623  -9.833   1.984  1.00  0.50
ATOM      3  C   GLU B   1       8.913 -11.104   1.521  1.00  0.50
ATOM      4  O   GLU B   1       9.187 -11.630   0.461  1.00  0.50
ATOM      5  CB  GLU B   1       8.814  -8.614   1.546  1.00  0.50
ATOM      6  CG  GLU B   1       7.372  -8.754   2.039  1.00  0.50
ATOM      7  CD  GLU B   1       7.339  -8.625   3.562  1.00  0.50
ATOM      8  OE1 GLU B   1       8.370  -8.307   4.131  1.00  0.50
ATOM      9  OE2 GLU B   1       6.284  -8.846   4.132  1.00  0.50
ATOM     10  N   THR B   2       7.998 -11.599   2.304  1.00  1.60
ATOM     11  CA  THR B   2       7.266 -12.832   1.907  1.00  1.60
ATOM     12  C   THR B   2       6.096 -12.456   1.005  1.00  1.60
ATOM     13  O   THR B   2       5.008 -12.217   1.466  1.00  1.60
ATOM     14  CB  THR B   2       6.731 -13.533   3.157  1.00  1.60
ATOM     15  OG1 THR B   2       7.662 -13.379   4.220  1.00  1.60
ATOM     16  CG2 THR B   2       6.526 -15.019   2.864  1.00  1.60
TER
END

(B) An example of how to use the RR format to submit a prediction of interchain (chains A and B) residue-residue contacts defined as Cb-Cb distances < 8 A.

PFRMAT RR
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
HLEGSIGILLKKHEIVFDGC         # <- entire target sequence (up to 50 
HDFGRTYIWQMSD                #    residues per line)
A1 B9   0  8  0.70        
A1 B10  0  8  0.70           # <- indices of residues: Ai and Bj, 
A1 B12  0  8  0.60           # <- the range of Cb-Cb distance predicted
A1 B14  0  8  0.20           #    for the residue pair: d1 and d2 (real),
A1 B15  0  8  0.10           # <- probability of the distance between 
A1 B17  0  8  0.30           #    Cb atoms being within the specified
A1 B19  0  8  0.50           #    range: p (real)
A2 B8   0  8  0.90
A3 B7   0  8  0.70
A3 B12  0  8  0.40
A3 B14  0  8  0.70
A3 B15  0  8  0.30
A4 B6   0  8  0.90
A7 B14  0  8  0.30
A9 B14  0  8  0.50
END

Example 6. Order-disorder regions prediction

Example of order-disorder regions prediction.

PFRMAT DR
TARGET Txxxx
AUTHOR xxxx-xxxx-xxxx
REMARK Predictor remarks
METHOD Description of methods used
METHOD Description of methods used
METHOD Description of methods used
MODEL  1
H D 0.70           # <- residue code,
L D 0.80           # <- order/disorder assignment code,
E D 0.80           # <- the number specifying the associated
G D 0.60           #    confidence level: 0.5 - residue not predicted 
S D 0.90           #                     >0.5 - disordered region 
I O 0.50           #                     <0.5 - ordered region
G O 0.40
I O 0.40
L O 0.30
L O 0.50
K O 0.50
K O 0.30
H O 0.20
E O 0.20
I O 0.40
V O 0.45
F D 0.60
D D 0.90
G D 0.60
C D 0.80
END