Details of automatic evaluation of CASP3 Comparative Modeling predictions


Automatic evaluation of comparative modeling predictions includes measures related to:

Values for angles, coordinates and accuracy of error estimates are provided for the following subsets of the structure:

In addition, the quality of prediction is evaluated providing measures based on:


MEASURES


I. ANGLES (RMS difference between model and target in dihedral angles)

In a given subset:

  ARMSMC        denotes dihedral angle rms for phi and psi angles. 
  ARMSSC        denotes dihedral angle rms for chi  angles.
  ARMSSC1       denotes dihedral angle rms for chi1 angles.
  ARMSSC2       denotes dihedral angle rms for chi2 angles.
  ARMSSC3       denotes dihedral angle rms for chi3 angles.
  ARMSSC4       denotes dihedral angle rms for chi4 angles.

  ANGMC_NP      denotes the number of main chain dihedral angles in the submitted prediction. 
  ANGSC1_NP     denotes the number of chi1 dihedral angles in the submitted prediction. 
  ANGSC2_NP     denotes the number of chi2 dihedral angles in the submitted prediction. 
  ANGSC3_NP     denotes the number of chi3 dihedral angles in the submitted prediction. 
  ANGSC4_NP     denotes the number of chi4 dihedral angles in the submitted prediction. 
  ANGMC_TN      denotes the total number of main chain dihedral angles in the target structure. 
  ANGSC1_TN     denotes the total number of chi1 dihedral angles in the target structure. 
  ANGSC2_TN     denotes the total number of chi2 dihedral angles in the target structure. 
  ANGSC3_TN     denotes the total number of chi3 dihedral angles in the target structure. 
  ANGSC4_TN     denotes the total number of chi4 dihedral angles in the target structure. 
  ANGMC_PP      denotes percent of main chain dihedral angles it is possible to evaluate in 
                the submitted prediction, i.e. ANGMC_NP/ANGMC_TN. 
  ANGSC1_PP     denotes percent of chi1 dihedral angles it is possible to evaluate in 
                the submitted prediction, i.e. ANGSC1_NP/ANGSC1_TN. 
  ANGSC2_PP     denotes percent of chi2 dihedral angles it is possible to evaluate in 
                the submitted prediction, i.e. ANGSC2_NP/ANGSC2_TN. 
  ANGSC3_PP     denotes percent of chi3 dihedral angles it is possible to evaluate in 
                the submitted prediction, i.e. ANGSC3_NP/ANGSC3_TN. 
  ANGSC4_PP     denotes percent of chi4 dihedral angles it is possible to evaluate in 
                the submitted prediction, i.e. ANGSC4_NP/ANGSC4_TN. 
  ANGMC_PC      denotes percent of main chain dihedral angles correct, i.e. with error 
                smaller than cutoff (30 degrees) and relative to ANGMC_NP. 
  ANGSC1_PC     denotes percent of chi1 dihedral angles correct, i.e. with error 
                smaller than cutoff (30 degrees) and relative to ANGSC1_NP. 
  ANGSC2_PC     denotes percent of chi2 dihedral angles correct, i.e. with error 
                smaller than cutoff (30 degrees) and relative to ANGSC2_NP. 
  ANGSC3_PC     denotes percent of chi3 dihedral angles correct, i.e. with error 
                smaller than cutoff (30 degrees) and relative to ANGSC3_NP. 
  ANGSC4_PC     denotes percent of chi4 dihedral angles correct, i.e. with error 
                smaller than cutoff (30 degrees) and relative to ANGSC4_NP. 

Comments:

  1. Dihedral angles are calculated in degrees and belong to the interval
     [-180, 180].
  2. Only the angles calculated based on the atoms provided in the target 
     structure are included.
  3. For each subset RMS is calculated by formula: SQRT(SUM(z*z)/n)
  4. Dihedral angles are calculated provided that all four atoms involved 
     fall into a given subset. An exception is made for main chain angles 
     (phi and psi for subset different than ALL) for which first or fourth 
     atom of the dihedral angle set does not have to belong to a given subset. 
  5. Calculating chi2 and chi3 angles "swapping" can be considered (optional).
     It means that in amino acids where atom names can be switched, i.e.
       for chi2 in amino acids PHE:  CD1 <-> CD2
                               TYR:  CD1 <-> CD2
                               ASP:  OD1 <-> OD2
       for chi3 in amino acid  GLU:  OE1 <-> OE2
     angular rms is calculated with an option to take the value more 
     favorable for the predictor. Atoms CD1 and CD2 in PHE and TYR, as well 
     as atoms OD1 and OD2 in ASP, OE1 and OE2 in GLU are exchanged. 
  6. Calculating chi angles the following residues are considered:
       for chi1:  VAL, LEU, ILE, PRO, MET, PHE, TRP, SER, THR,
                  CYS, TYR, ASN, GLN, ASP, GLU, LYS, ARG, HIS
       for chi2:       LEU, ILE, PRO, MET, PHE, TRP,
                       TYR, ASN, GLN, ASP, GLU, LYS, ARG, HIS
       for chi3:                      MET,
                                 GLN,      GLU, LYS, ARG 
       for chi4:                                LYS, ARG 

II. COORDINATES (RMS difference between model and target in atom positions)

  CRN        denotes rms for C-alpha atoms / number of predicted C-alpha atoms, 
               i.e CRMSCA/NP
In a given subset:
  CRMSCA     denotes rms difference for C-alpha atoms. 
  CRMSMC     denotes rms difference for main chain and C-beta atoms. 
  CRMSSC     denotes rms difference for side chain atoms. 
  CRMSALL    denotes rms difference for all atoms.
 
  ATOMCA_NP  denotes the number of CA atoms in the submitted prediction. 
  ATOMMC_NP  denotes the number of main chain and C-beta atoms in the submitted prediction. 
  ATOMSC_NP  denotes the number of side chain atoms in the submitted prediction. 
  ATOMALL_NP denotes the number of all atoms in the submitted prediction. 
  ATOMCA_TN  denotes the total number of CA atoms in the target structure. 
  ATOMMC_TN  denotes the total number of main chain and C-beta atoms in the target structure. 
  ATOMSC_TN  denotes the total number of side chain atoms in the target structure. 
  ATOMALL_TN denotes the total number of all atoms in the target structure. 
  ATOMCA_PP  denotes percent of CA atoms it is possible to evaluate in the 
             submitted prediction, i.e. ATOMCA_NP/ATOMCA_TN
  ATOMMC_PP  denotes percent of main chain and C-beta atoms it is possible to evaluate 
             in the submitted prediction, i.e. ATOMMC_NP/ATOMMC_TN
  ATOMSC_PP  denotes percent of side chain atoms it is possible to evaluate 
             in the submitted prediction, i.e. ATOMSC_NP/ATOMSC_TN
  ATOMALL_PP denotes percent of all atoms it is possible to evaluate 
             in the submitted prediction, i.e. ATOMALL_NP/ATOMALL_TN

Comments:

  1. Only the atoms provided in the target structure are included.
  2. For each subset RMS in Angstroms is calculated by formula: 
     SQRT(SUM(z*z)/n)
  3. Global superposition of model and target can be calculated based on the
     position of (default d)):
       a) all atoms
       b) C-alpha atoms
       c) main chain including C-beta atoms
       d) C-alpha atoms using iterative superposition procedure (ISP) with 
          cutoff (default 2.5 Å)
  4. The goal of the ISP method is to exclude from the calculations atoms
     that are more than some threshold (cutoff) distance between the 
     model and the target structure after the transform is applied.
     Starting from the initial set of atoms (C-alphas) the algorithm is the 
     following:
       a) obtain the transform
       b) apply the transform
       c) identify all atom pairs for which distance is larger than the threshold
       d) re-obtain the transform, excluding those atoms
       e) repeat b) - d) until the set of atoms used in calculations
          is the same for two cycles running
  5. Global superposition of parent and target is calculated based on the
     position of C-alpha atoms
  6. Local superposition of model and target is calculated based on the
     position of the local set of C-alpha atoms (except "LIGAND CONTACTS" 
     subset where all atoms being in contact are used)
  7. It should be at least 3 atoms available to calculate superposition 
  8. Calculating RMS "swapping" can be considered (optional).
     It means that in amino acids where atom names can be switched, i.e.
       for ASP: OD1 <-> OD2
       for GLU: OE1 <-> OE2
       for PHE: CD1 <-> CD2
                CE1 <-> CE2
       for TYR: CD1 <-> CD2
                CE1 <-> CE2 
     cartesian rms is calculated with an option to take the value more 
     favorable for the predictor. Sets (CD1, CE1) and (CD2, CE2) 
     in PHE and TYR, as well as atoms OD1 and OD2 in ASP, OE1 and OE2 in
     GLU are exchanged and more favorable contributions to rms are taken into 
     account. 

III. RMS DETAILS OF LOOPS

Cartesian rms in both global and local superposition has been calculated here on C-alpha (CA), main chain (MC), and all atoms (ALL) for each loop that contains at least three residues. The loop definition is as provided below (see "LARGE SHIFTS/INSERTIONS").

In a given loop:

  NP  denotes the number of atoms in the submitted prediction. 
  TN  denotes the total number of atoms in the target structure. 
  PP  denotes percent of atoms it is possible to evaluate in the submitted 
      prediction, i.e. NP/TN

IV. ACCURACY OF ERROR ESTIMATES

In a given subset:

  ERRCA  denotes error estimates for C-alpha atoms. 
  ERRMC  denotes error estimates for main chain and C-beta atoms. 
  ERRSC  denotes error estimates for side chain atoms. 
  ERRALL denotes error estimates for all atoms. 
  NP     denotes the number of atoms in the submitted prediction. 
  TN     denotes the total number of atoms in the target structure. 
  PP     denotes percent of atoms it is possible to evaluate in the submitted 
         prediction, i.e. NP/TN. 

Comments:

  Accuracy of estimates of atomic coordinate errors between model and target
  is calculated using the following formulas:
  |D-E|         mean deviation between observed distance in atomic 
                positions (D) and the estimated one (E) in Angstroms:
 
                         1  SUM
                |D-E| = --- SUM |D(i)-E(i)|
                         N  SUM

                Where D(i) is the distance between atoms (model - target) 
                and E(i) the predictor provided estimate for specified
                atom. The sum is over all atoms in given subset.

  |D-E|/|D+E|   mean value of the normalized deviation: 

                               1  SUM  |D(i)-E(i)|
                |D-E|/|D+E| = --- SUM -------------
                               N  SUM |D(i)|+|E(i)|

                Since the following is true for each atom:
                0 <= |D(i)-E(i)| <= |D(i)|+|E(i)|
                this measure approaches 0 for correctly estimated errors 
                and 1 for wrong error judgments.  


SUBSET DESCRIPTIONS


"ALL"

All atoms or dihedral angles possible to evaluate are considered.

Comments:
  1. Inclusion is evaluated per atom
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       LOOP rms:      CRMSCA CRMSMC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"WELL ORDERED"

In the case of crystallographically determined structures this subset selects parts of target structure that are not affected by the uncertainty associated with thermal motion or disorder. Target structure temperature factors and reported alternative atomic positions are used as discriminators. If the temperature factor is greater than cutoff (presently 20 Angstroms) then the atom is not included. Similarly, when an alternative position is listed, that atom is not included as well.

Comments:

  1. Inclusion is evaluated per atom
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"NO INTERMOL CONTACTS"

This subset selects parts of target structure that are not affected by interactions with neighboring molecules (crystal contacts). If an atom is in an intermolecular contact (presently defined as a distance smaller than 4 A, and calculated with J. Moult's CONANA) then it is not included in this subset.

Comments:

  1. Inclusion is evaluated per atom 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"RELIABLE SIDE CHAINS"

This subset selects segments of sidechains deemed unreliable crystallographically (a rotation of 180 degrees could be undetectable), and excludes them from cartesian and angular RMS calculations:

  from residue HIS:  atoms ND1 CD2 CE1 NE2        angle chi2
  from residue ASN:  atoms OD1 ND2                angle chi2      
  from residue GLN:  atoms OE1 NE2                angle chi3
  from residue VAL:  atoms CG1 CG2                angle chi1
  from residue LEU:  atoms CD1 CD2                angle chi2

Comments:

  1. Above atoms and dihedral angles are excluded. 
  2. Measures:
       angular rms:   ARMSSC
       cartesian rms: CRMSSC
       ERRORS:        ERRSC

"CHANGED ANGLES"

This subset selects angles that are rotamerically different from the corresponding ones in the parent structure. The goal here is to select a set of dihedral angles that have been particularly difficult to predict correctly. For each global alignment (DALI), corresponding side chain dihedrals in target and parent structure are checked for consistency. Discrepancies greater than cutoff (presently 30 degrees) are marked and each corresponding chi1 - chi4 is included.

Comments:

  1. Inclusion is evaluated per atom 
  2. Measures:
       angular rms:   ARMSSC
       cartesian rms: CRMSSC
       ERRORS:        ERRSC

"SHIFTED CHAIN"

This subset selects segments of target structure that differ in position by a value of cutoff in the global alignment. Again the goal is to mark segments of structure for which prediction has been particularly difficult. For each global alignment (DALI) between structures of target and parent, residues with corresponding CA distances greater than 1 Angstrom are included.

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"ALTERNATIVE PARENT"

This subset selects segments of target structure for which selection of a parent other than the most obvious one has been correct. The purpose is to mark segments for which making a non-trivial selection of the template produces better results. For global alignments (DALI) between structures of target and principal parent (by sequence identity) as well as target and each alternative parent, residues for which the following is true are included:

  (target - alternative parent) < (target - principal parent) - 1.0 

Comments:

  1. Inclusion is evaluated per residue
  2. CA distances are calculated (in Angstroms)
  3. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"SECONDARY STRUCTURE"

This subset selects secondary structure elements in the target structure. DSSP three state output (H E -) with lower bounds of 6 residues for helix and 3 residues for strand is used to define secondary structure elements. Helices and strands such defined are included.

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"SHIFTED SS UNITS"

This subset selects SS elements that have significantly moved relative to the parent structure. The goal is to denote such non-trivial shifts. Secondary structure elements defined as in "SECONDARY STRUCTURE" above are marked in structures of both target and principal parent. An overlap of at least 6 and 3 residues for helix and strand respectively is required between segments of secondary structure of target and parent. A difference of 1 Angstrom RMS on CA's is required between global and local alignment of selected pairs of SS elements to be included in this subset. Residues in the overlapping regions are included in this subset.

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       cartesian rms: CRMSCA CRMSMC

"LARGE SHIFTS/INSERTIONS"

This subset selects "loop" segments. The purpose is to specify these particularly difficult to predict regions of structure. For each global alignment (DALI) between structures of target and parent, residues with corresponding CA distances greater than cutoff (presently 2.5 Å) are included. If fewer than three residues exist between such segments, they are embodied into the segments and included as well.

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"LIGAND CONTACTS"

This subset selects regions of structure that are in contact with the ligand molecule(s). The goal is to determine segments of structure that might have been modified by the intermolecular interactions with ligand. Intermolecular contact program CONANA is used to produce output of close contacts between ligand and protein. A cutoff of 6 Å defines protein neighborhood for local model/target structural alignment. Subsequently protein atoms in contact with ligand (4 Å cutoff) are included in this subset.

Comments:

  1. Inclusion is evaluated per atom 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"SURFACE"

The goal is to divide structure into surface and core regions. Surface accessibility is calculated according to Lee & Richards. Subsequently fractional values are calculated relative to Shrake & Rupley Gly-X-Gly standards. Residues with values greater than cutoff (presently 20% accessibility) are included.

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"BURIED"

This subset is complementary to "SURFACE".

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL

"CORE"

This subset is complementary to "LARGE SHIFTS/INSERTIONS".

Comments:

  1. Inclusion is evaluated per residue 
  2. Measures:
       angular rms:   ARMSMC ARMSSC
       cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
       ERRORS:        ERRCA  ERRMC  ERRSC  ERRALL




Distance cutoff based measures

Based on global superposition (target - model), absolute number and fraction of corresponding atom pairs (target - model) is calculating using several distance cutoff subsets.

In a given subset of distance cutoff DIST:

  DISTCA_N    denotes the number of CA atom pairs for which distance = |target-model| < DIST
  DISTCA_P    denotes the percent of CA atom pairs for which distance = |target-model| < DIST
  DISTALL_N   denotes the number of ALL atom pairs for which distance = |target-model| < DIST
  DISTALL_P   denotes the percent of ALL atom pairs for which distance = |target-model| < DIST

The following subsets of distance cutoff DIST are defined:


Measures based on the presence of structural alignment between target and principal parent

For global superposition of target and parent structures regions where distance between corresponding CA atoms is greater than cutoff (presently 2.5 Å) are defined as being not aligned (see "LARGE SHIFTS/INSERTIONS"). For the rest of the chain the alignment is defined as being present.

The absence of target - parent structural alignment selects "loop" segments, particularly difficult to predict regions of structure. Specifying chain region(s) for which alignment is present provides insight of whether changes introduced in template (parent structure) improved model and in what extent.

CRMSCA T-P         RMS difference for CA atoms in aligned region for target - parent 
CRMSCA T-M         RMS difference for CA atoms in this region for target - model 
CRMSCA_LOOPS T-M   RMS difference for CA atoms not in this region for target - model (loops) 
N T-P              The number of CA atoms in aligned region for target - parent 
N T-M              The number of CA atoms in this region for target - model 
N_LOOPS T-M        The number of CA atoms not in this region for target - model (loops) 
P T-P              The percent of target CA atoms in aligned region for target - parent
P T-M              The percent of target CA atoms in this region for target - model
P_LOOPS T-M        The percent of target CA atoms not in this region for target - model (loops)


Alignment accuracy measures based on the lowest RMS sequence independent superposition, when such superposition could be generated by the DALI server

For each residue in the model structure the closest residue in the target structure is identified.

The number of residues in the model for which the closest residue in the target is correct one, and the distance between them is less than 3.8 Angstroms is reported. This gives the number of correctly aligned residues.

The number of positions for which the model residue is closest to a residue in the target within +/-4 residues, and the distance is less than 3.8 Angstroms is reported. This gives the number aligned within 4 residues of the correct value.

ALIGN_A_N       Number of a.a. aligned exactly
ALIGN_A_P       Percent of a.a. aligned exactly
ALIGN_A4_N      Number of a.a. aligned within +/-4 sequence window
ALIGN_A4_P      Percent of a.a. aligned within +/-4 sequence window





[Home] Protein Structure Prediction Center