Automatic evaluation of comparative modeling predictions includes
measures related to:
In addition, the quality of prediction is evaluated providing measures based on:
In a given subset:
ARMSMC denotes dihedral angle rms for phi and psi angles.
ARMSSC denotes dihedral angle rms for chi angles.
ARMSSC1 denotes dihedral angle rms for chi1 angles.
ARMSSC2 denotes dihedral angle rms for chi2 angles.
ARMSSC3 denotes dihedral angle rms for chi3 angles.
ARMSSC4 denotes dihedral angle rms for chi4 angles.
ANGMC_NP denotes the number of main chain dihedral angles in the submitted prediction.
ANGSC1_NP denotes the number of chi1 dihedral angles in the submitted prediction.
ANGSC2_NP denotes the number of chi2 dihedral angles in the submitted prediction.
ANGSC3_NP denotes the number of chi3 dihedral angles in the submitted prediction.
ANGSC4_NP denotes the number of chi4 dihedral angles in the submitted prediction.
ANGMC_TN denotes the total number of main chain dihedral angles in the target structure.
ANGSC1_TN denotes the total number of chi1 dihedral angles in the target structure.
ANGSC2_TN denotes the total number of chi2 dihedral angles in the target structure.
ANGSC3_TN denotes the total number of chi3 dihedral angles in the target structure.
ANGSC4_TN denotes the total number of chi4 dihedral angles in the target structure.
ANGMC_PP denotes percent of main chain dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGMC_NP/ANGMC_TN.
ANGSC1_PP denotes percent of chi1 dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGSC1_NP/ANGSC1_TN.
ANGSC2_PP denotes percent of chi2 dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGSC2_NP/ANGSC2_TN.
ANGSC3_PP denotes percent of chi3 dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGSC3_NP/ANGSC3_TN.
ANGSC4_PP denotes percent of chi4 dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGSC4_NP/ANGSC4_TN.
ANGMC_PC denotes percent of main chain dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGMC_NP.
ANGSC1_PC denotes percent of chi1 dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGSC1_NP.
ANGSC2_PC denotes percent of chi2 dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGSC2_NP.
ANGSC3_PC denotes percent of chi3 dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGSC3_NP.
ANGSC4_PC denotes percent of chi4 dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGSC4_NP.
Comments:
1. Dihedral angles are calculated in degrees and belong to the interval
[-180, 180].
2. Only the angles calculated based on the atoms provided in the target
structure are included.
3. For each subset RMS is calculated by formula: SQRT(SUM(z*z)/n)
4. Dihedral angles are calculated provided that all four atoms involved
fall into a given subset. An exception is made for main chain angles
(phi and psi for subset different than ALL) for which first or fourth
atom of the dihedral angle set does not have to belong to a given subset.
5. Calculating chi2 and chi3 angles "swapping" can be considered (optional).
It means that in amino acids where atom names can be switched, i.e.
for chi2 in amino acids PHE: CD1 <-> CD2
TYR: CD1 <-> CD2
ASP: OD1 <-> OD2
for chi3 in amino acid GLU: OE1 <-> OE2
angular rms is calculated with an option to take the value more
favorable for the predictor. Atoms CD1 and CD2 in PHE and TYR, as well
as atoms OD1 and OD2 in ASP, OE1 and OE2 in GLU are exchanged.
6. Calculating chi angles the following residues are considered:
for chi1: VAL, LEU, ILE, PRO, MET, PHE, TRP, SER, THR,
CYS, TYR, ASN, GLN, ASP, GLU, LYS, ARG, HIS
for chi2: LEU, ILE, PRO, MET, PHE, TRP,
TYR, ASN, GLN, ASP, GLU, LYS, ARG, HIS
for chi3: MET,
GLN, GLU, LYS, ARG
for chi4: LYS, ARG
CRN denotes rms for C-alpha atoms / number of predicted C-alpha atoms,
i.e CRMSCA/NP
In a given subset:
CRMSCA denotes rms difference for C-alpha atoms.
CRMSMC denotes rms difference for main chain and C-beta atoms.
CRMSSC denotes rms difference for side chain atoms.
CRMSALL denotes rms difference for all atoms.
ATOMCA_NP denotes the number of CA atoms in the submitted prediction.
ATOMMC_NP denotes the number of main chain and C-beta atoms in the submitted prediction.
ATOMSC_NP denotes the number of side chain atoms in the submitted prediction.
ATOMALL_NP denotes the number of all atoms in the submitted prediction.
ATOMCA_TN denotes the total number of CA atoms in the target structure.
ATOMMC_TN denotes the total number of main chain and C-beta atoms in the target structure.
ATOMSC_TN denotes the total number of side chain atoms in the target structure.
ATOMALL_TN denotes the total number of all atoms in the target structure.
ATOMCA_PP denotes percent of CA atoms it is possible to evaluate in the
submitted prediction, i.e. ATOMCA_NP/ATOMCA_TN
ATOMMC_PP denotes percent of main chain and C-beta atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMMC_NP/ATOMMC_TN
ATOMSC_PP denotes percent of side chain atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMSC_NP/ATOMSC_TN
ATOMALL_PP denotes percent of all atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMALL_NP/ATOMALL_TN
Comments:
1. Only the atoms provided in the target structure are included.
2. For each subset RMS in Angstroms is calculated by formula:
SQRT(SUM(z*z)/n)
3. Global superposition of model and target can be calculated based on the
position of (default d)):
a) all atoms
b) C-alpha atoms
c) main chain including C-beta atoms
d) C-alpha atoms using iterative superposition procedure (ISP) with
cutoff (default 2.5 Å)
4. The goal of the ISP method is to exclude from the calculations atoms
that are more than some threshold (cutoff) distance between the
model and the target structure after the transform is applied.
Starting from the initial set of atoms (C-alphas) the algorithm is the
following:
a) obtain the transform
b) apply the transform
c) identify all atom pairs for which distance is larger than the threshold
d) re-obtain the transform, excluding those atoms
e) repeat b) - d) until the set of atoms used in calculations
is the same for two cycles running
5. Global superposition of parent and target is calculated based on the
position of C-alpha atoms
6. Local superposition of model and target is calculated based on the
position of the local set of C-alpha atoms (except "LIGAND CONTACTS"
subset where all atoms being in contact are used)
7. It should be at least 3 atoms available to calculate superposition
8. Calculating RMS "swapping" can be considered (optional).
It means that in amino acids where atom names can be switched, i.e.
for ASP: OD1 <-> OD2
for GLU: OE1 <-> OE2
for PHE: CD1 <-> CD2
CE1 <-> CE2
for TYR: CD1 <-> CD2
CE1 <-> CE2
cartesian rms is calculated with an option to take the value more
favorable for the predictor. Sets (CD1, CE1) and (CD2, CE2)
in PHE and TYR, as well as atoms OD1 and OD2 in ASP, OE1 and OE2 in
GLU are exchanged and more favorable contributions to rms are taken into
account.
Cartesian rms in both global and local superposition has been calculated here on C-alpha (CA), main chain (MC), and all atoms (ALL) for each loop that contains at least three residues. The loop definition is as provided below (see "LARGE SHIFTS/INSERTIONS").
In a given loop:
NP denotes the number of atoms in the submitted prediction.
TN denotes the total number of atoms in the target structure.
PP denotes percent of atoms it is possible to evaluate in the submitted
prediction, i.e. NP/TN
In a given subset:
ERRCA denotes error estimates for C-alpha atoms.
ERRMC denotes error estimates for main chain and C-beta atoms.
ERRSC denotes error estimates for side chain atoms.
ERRALL denotes error estimates for all atoms.
NP denotes the number of atoms in the submitted prediction.
TN denotes the total number of atoms in the target structure.
PP denotes percent of atoms it is possible to evaluate in the submitted
prediction, i.e. NP/TN.
Comments:
Accuracy of estimates of atomic coordinate errors between model and target
is calculated using the following formulas:
|D-E| mean deviation between observed distance in atomic
positions (D) and the estimated one (E) in Angstroms:
1 SUM
|D-E| = --- SUM |D(i)-E(i)|
N SUM
Where D(i) is the distance between atoms (model - target)
and E(i) the predictor provided estimate for specified
atom. The sum is over all atoms in given subset.
|D-E|/|D+E| mean value of the normalized deviation:
1 SUM |D(i)-E(i)|
|D-E|/|D+E| = --- SUM -------------
N SUM |D(i)|+|E(i)|
Since the following is true for each atom:
0 <= |D(i)-E(i)| <= |D(i)|+|E(i)|
this measure approaches 0 for correctly estimated errors
and 1 for wrong error judgments.
All atoms or dihedral angles possible to evaluate are considered.
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
LOOP rms: CRMSCA CRMSMC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
In the case of crystallographically determined structures this subset selects parts of target structure that are not affected by the uncertainty associated with thermal motion or disorder. Target structure temperature factors and reported alternative atomic positions are used as discriminators. If the temperature factor is greater than cutoff (presently 20 Angstroms) then the atom is not included. Similarly, when an alternative position is listed, that atom is not included as well.
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset selects parts of target structure that are not affected by interactions with neighboring molecules (crystal contacts). If an atom is in an intermolecular contact (presently defined as a distance smaller than 4 A, and calculated with J. Moult's CONANA) then it is not included in this subset.
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset selects segments of sidechains deemed unreliable crystallographically (a rotation of 180 degrees could be undetectable), and excludes them from cartesian and angular RMS calculations:
from residue HIS: atoms ND1 CD2 CE1 NE2 angle chi2 from residue ASN: atoms OD1 ND2 angle chi2 from residue GLN: atoms OE1 NE2 angle chi3 from residue VAL: atoms CG1 CG2 angle chi1 from residue LEU: atoms CD1 CD2 angle chi2
Comments:
1. Above atoms and dihedral angles are excluded.
2. Measures:
angular rms: ARMSSC
cartesian rms: CRMSSC
ERRORS: ERRSC
This subset selects angles that are rotamerically different from the corresponding ones in the parent structure. The goal here is to select a set of dihedral angles that have been particularly difficult to predict correctly. For each global alignment (DALI), corresponding side chain dihedrals in target and parent structure are checked for consistency. Discrepancies greater than cutoff (presently 30 degrees) are marked and each corresponding chi1 - chi4 is included.
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSSC
cartesian rms: CRMSSC
ERRORS: ERRSC
This subset selects segments of target structure that differ in position by a value of cutoff in the global alignment. Again the goal is to mark segments of structure for which prediction has been particularly difficult. For each global alignment (DALI) between structures of target and parent, residues with corresponding CA distances greater than 1 Angstrom are included.
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset selects segments of target structure for which selection of a parent other than the most obvious one has been correct. The purpose is to mark segments for which making a non-trivial selection of the template produces better results. For global alignments (DALI) between structures of target and principal parent (by sequence identity) as well as target and each alternative parent, residues for which the following is true are included:
(target - alternative parent) < (target - principal parent) - 1.0
Comments:
1. Inclusion is evaluated per residue
2. CA distances are calculated (in Angstroms)
3. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset selects secondary structure elements in the target structure. DSSP three state output (H E -) with lower bounds of 6 residues for helix and 3 residues for strand is used to define secondary structure elements. Helices and strands such defined are included.
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset selects SS elements that have significantly moved relative to the parent structure. The goal is to denote such non-trivial shifts. Secondary structure elements defined as in "SECONDARY STRUCTURE" above are marked in structures of both target and principal parent. An overlap of at least 6 and 3 residues for helix and strand respectively is required between segments of secondary structure of target and parent. A difference of 1 Angstrom RMS on CA's is required between global and local alignment of selected pairs of SS elements to be included in this subset. Residues in the overlapping regions are included in this subset.
Comments:
1. Inclusion is evaluated per residue
2. Measures:
cartesian rms: CRMSCA CRMSMC
This subset selects "loop" segments. The purpose is to specify these particularly difficult to predict regions of structure. For each global alignment (DALI) between structures of target and parent, residues with corresponding CA distances greater than cutoff (presently 2.5 Å) are included. If fewer than three residues exist between such segments, they are embodied into the segments and included as well.
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset selects regions of structure that are in contact with the ligand molecule(s). The goal is to determine segments of structure that might have been modified by the intermolecular interactions with ligand. Intermolecular contact program CONANA is used to produce output of close contacts between ligand and protein. A cutoff of 6 Å defines protein neighborhood for local model/target structural alignment. Subsequently protein atoms in contact with ligand (4 Å cutoff) are included in this subset.
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
The goal is to divide structure into surface and core regions. Surface accessibility is calculated according to Lee & Richards. Subsequently fractional values are calculated relative to Shrake & Rupley Gly-X-Gly standards. Residues with values greater than cutoff (presently 20% accessibility) are included.
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset is complementary to "SURFACE".
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
This subset is complementary to "LARGE SHIFTS/INSERTIONS".
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
In a given subset of distance cutoff DIST:
DISTCA_N denotes the number of CA atom pairs for which distance = |target-model| < DIST DISTCA_P denotes the percent of CA atom pairs for which distance = |target-model| < DIST DISTALL_N denotes the number of ALL atom pairs for which distance = |target-model| < DIST DISTALL_P denotes the percent of ALL atom pairs for which distance = |target-model| < DISTThe following subsets of distance cutoff DIST are defined:
The absence of target - parent structural alignment selects "loop" segments, particularly difficult to predict regions of structure. Specifying chain region(s) for which alignment is present provides insight of whether changes introduced in template (parent structure) improved model and in what extent.
CRMSCA T-P RMS difference for CA atoms in aligned region for target - parent CRMSCA T-M RMS difference for CA atoms in this region for target - model CRMSCA_LOOPS T-M RMS difference for CA atoms not in this region for target - model (loops) N T-P The number of CA atoms in aligned region for target - parent N T-M The number of CA atoms in this region for target - model N_LOOPS T-M The number of CA atoms not in this region for target - model (loops) P T-P The percent of target CA atoms in aligned region for target - parent P T-M The percent of target CA atoms in this region for target - model P_LOOPS T-M The percent of target CA atoms not in this region for target - model (loops)
The number of residues in the model for which the closest residue in the target is correct one, and the distance between them is less than 3.8 Angstroms is reported. This gives the number of correctly aligned residues.
The number of positions for which the model residue is closest to a residue in the target within +/-4 residues, and the distance is less than 3.8 Angstroms is reported. This gives the number aligned within 4 residues of the correct value.
ALIGN_A_N Number of a.a. aligned exactly ALIGN_A_P Percent of a.a. aligned exactly ALIGN_A4_N Number of a.a. aligned within +/-4 sequence window ALIGN_A4_P Percent of a.a. aligned within +/-4 sequence window