This document provides short description of numerical results for two types of protein structure predictions conforming to CASP3 submission criteria:
Conformational state of residues is defined as follows:
Percentage of residues for which secondary structure prediction was made (residues were assigned secondary structure with nonzero probability). The number is provided for the reference.
Qindex (Qhelix, Qstrand, Qcoil, Q3) gives percentage of residues predicted correctly as helix(H), strand(E), coil(C) or for all three conformational states. The definition of Qindex is as follows.
For a single conformational state:
number of residues correctly predicted in state i
Qi = ------------------------------------------------- * 100,
number of residues observed in state i
where i is either helix, strand or coil.
For all three states:
number of residues correctly predicted
Q3 = -------------------------------------- * 100
number of all residues
For a single conformational state:
1 SUM MINOV(S1;S2) + DELTA(S1;S2)
SOV(i) = --- SUM --------------------------- * LEN(S1)
N(i) SUM MAXOV(S1;S2)
S(i)
S1 and S2 are the observed and predicted secondary structure segments
(in state i, which can be either H, E or C);
LEN(S1) is the number of residues in the segments S1;
MINOV(S1;S2) is the length of actual overlap of S1 and S2, i.e.
the extent for which both segments have residues in state i,
for example H;
MAXOV(S1;S2) is the length of the total extent for which either of
the segments S1 or S2 has a residue in state i;
DELTA(S1;S2) is the integer value defined as being equal to the
MIN{(MAXOV(S1;S2)- MINOV(S1;S2)); MINOV(S1;S2);
INT(LEN(S1)/2); INT(LEN(S2)/2)}
THE SUM is taken over S, all the pairs of segments {S1;S2},
where S1 and S2 have at least one residue in state i
in common;
N(i) is the number of residues in state i defined as follows:
SUM SUM
N(i) = SUM LEN(S1) + SUM LEN(S1)
SUM SUM
S(i) S'(i)
Two sums are taken over S and S'
S(i) is the number of all the pairs of segments {S1;S2},
where S1 and S2 have at least one residue in state i
in common
S'(i) is the number of segments S1 that do not produce
any segment pair
For all three states:
1 SUM SUM MINOV(S1;S2) + DELTA(S1;S2)
SOV = --- SUM SUM --------------------------- * LEN(S1)
N SUM SUM MAXOV(S1;S2)
i S(i)
where the normalization value N is a sum of N(i) over all three
conformational states (i = HELIX, STRAND, COIL):
SUM
N = SUM N(i)
SUM
i
SOV observed indicates that S1 is observed fragment and S2 is predicted one.
Number of helical and strand segments of experimental structure that are in the protein chain region common for both experimental and predicted structures The number is provided for the reference.
Percentage of helical and strand segments predicted correctly is calculated according to the following formula (coil regions are not taken into account)
Percent 100% SUM Ncorrect_i(segment)
predicted = ---- SUM ------------------------------------------
correctly NSEG SUM Ncorrect_i(segment) + Nwrong_i(segment)
where
i is either H or E conformational state (coil is completely ignored);
Ncorrect_i is the number of residues predicted correctly for a segment,
which in the observed structure has conformational state i;
Nwrong_i is the number of residues predicted as a wrong state (i.e. H is
predicted as E, or E as H) for a segment, which in the observed
structure has conformational state i;
The SUM is taken over all considered segments in the observed structure;
Percentage of helical and strand segments predicted as a wrong type (i.e. H as E or E as H) is calculated as follows (coil regions are not taken into account)
Percent 100% SUM Nwrong_i(segment)
predicted as = ---- SUM ------------------------------------------
a wrong type NSEG SUM Ncorrect_i(segment) + Nwrong_i(segment)
where
The SUM is taken over all considered segments in the observed structure;
If observed segment overlaps with n predicted segments (of the same conformational state) then it is predicted with n-1 wrong breaks. The number of wrong brakes is the sum of all such cases.
If predicted segment overlaps with n observed segments (of the same conformational state) then predicted segment has n-1 wrong joints. The number of wrong joints is the sum of all such cases.
Example
observed: EEEEEEEE EEE EEE EEE
predicted: EEE EEE EEE EEEEEEE
Number of wrong breaks - 1
Number of wrong joints - 1
To evaluate SSS contacts secondary structure segments from experimental structure are first matched with ones from prediction. If there is an overlap between observed and predicted secondary structure segments of at least one residue these segments are considered equivalent. To be matched secondary structure segments do not necessary have to be of the same type (i.e. helix can be matched with strand). Then the list of contacting pairs of secondary structure segments in experimental structure is contrasted with the prediction. The following definitions are used in this "alignment":
To analyze the quality of contact prediction between secondary structure segments the following values and reference numbers are provided:
the number of all contacts between secondary structure segments observed in the entire experimental structure
the number of all contacts between secondary structure segments in the entire prediction
the number of contacts in the experimental structure region for which prediction was made
Percentage of target contacts in predicted subset ("SS_T_P")
the percent of the total number of contacts in target that are observed in the experimental structure region for which prediction was made
the number of observed contacts that were predicted (without taking into account the type of contacting secondary structure segments and the type of interaction) in the chain region, common for both experimental structure and prediction.
Percentage of contacts predicted ("SS_P_PP")
the percent of observed contacts that were predicted (without taking into account the type of contacting secondary structure segments and the type of interaction) in the chain region, common for both experimental structure and prediction.
To analyze the quality of residue-residue contact prediction various reference numbers and values are provided.
FOR RESIDUES
The measures used to evaluate residue-residue contacts are calculated using 4 minimal
separation (SEP) intervals along the chain between considered residues:
Example: Let's assume that two residues are in contact and there are 6 residues in between them along the chain. This contact will be classified as belonging to 5-8 separation and will not be counted in 1-4 and 9-9999 separation intervals.
In addition, using target<-->model global superposition, the quality of prediction
is evaluated providing
absolute number and fraction of atom pairs (target<-->model)
for subsets of distance cutoff DIST.
CRN denotes rms for C-alpha atoms / number of predicted C-alpha atoms,
i.e CRMSCA/NP
In a given subset:
CRMSCA denotes rms difference for C-alpha atoms.
CRMSMC denotes rms difference for main chain and C-beta atoms.
CRMSSC denotes rms difference for side chain atoms.
CRMSALL denotes rms difference for all atoms.
ATOMCA_NP denotes the number of CA atoms in the submitted prediction.
ATOMMC_NP denotes the number of main chain and C-beta atoms in the submitted prediction.
ATOMSC_NP denotes the number of side chain atoms in the submitted prediction.
ATOMALL_NP denotes the number of all atoms in the submitted prediction.
ATOMCA_TN denotes the total number of CA atoms in the target structure.
ATOMMC_TN denotes the total number of main chain and C-beta atoms in the target structure.
ATOMSC_TN denotes the total number of side chain atoms in the target structure.
ATOMALL_TN denotes the total number of all atoms in the target structure.
ATOMCA_PP denotes percent of CA atoms it is possible to evaluate in the submitted
prediction, i.e. ATOMCA_NP/ATOMCA_TN
ATOMMC_PP denotes percent of main chain and C-beta atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMMC_NP/ATOMMC_TN
ATOMSC_PP denotes percent of side chain atoms it is possible to evaluate in the submitted
prediction, i.e. ATOMSC_NP/ATOMSC_TN
ATOMALL_PP denotes percent of all atoms it is possible to evaluate in the submitted
prediction, i.e. ATOMALL_NP/ATOMALL_TN
Comments:
1. Only the atoms provided in the target structure are included.
2. For each subset RMS in Angstroms is calculated by formula:
SQRT(SUM(z*z)/n)
3. Global superposition of model and target can be calculated based on the
position of (default d)):
a) all atoms
b) C-alpha atoms
c) main chain including C-beta atoms
d) C-alpha atoms using iterative superposition procedure (ISP) with
cutoff (default 2.5 Å)
4. The goal of the ISP method is to exclude from the calculations atoms
that are more than some threshold (cutoff) distance between the
model and the target structure after the transform is applied.
Starting from the initial set of atoms (C-alphas) the algorithm is the
following:
a) obtain the transform
b) apply the transform
c) identify all atoms differences larger than the threshold
d) re-obtain the transform, excluding those atoms
e) repeat b) - d) until the set of atoms used in calculations
is the same for two cycles running
5. It should be at least 3 atoms available to calculate superposition
6. Calculating RMS "swapping" can be considered (optional).
It means that in amino acids where atom names can be switched, i.e.
for ASP: OD1 <-> OD2
for GLU: OE1 <-> OE2
for PHE: CD1 <-> CD2
CE1 <-> CE2
for TYR: CD1 <-> CD2
CE1 <-> CE2
cartesian rms is calculated with an option to take the value more
favorable for the predictor. Sets (CD1, CE1) and (CD2, CE2)
in PHE and TYR, as well as atoms OD1 and OD2 in ASP, OE1 and OE2 in
GLU are exchanged and more favorable contributions to rms are taken into
account.
In a given subset:
ARMSMC denotes dihedral angle rms for phi and psi angles.
ARMSSC denotes dihedral angle rms for chi angles.
ARMSSC1 denotes dihedral angle rms for chi1 angles.
ARMSSC2 denotes dihedral angle rms for chi2 angles.
ANGMC_NP denotes the number of main chain dihedral angles in the submitted prediction.
ANGSC1_NP denotes the number of chi1 dihedral angles in the submitted prediction.
ANGSC2_NP denotes the number of chi2 dihedral angles in the submitted prediction.
ANGMC_TN denotes the total number of main chain dihedral angles in the target structure.
ANGSC1_TN denotes the total number of chi1 dihedral angles in the target structure.
ANGSC2_TN denotes the total number of chi2 dihedral angles in the target structure.
ANGMC_PP denotes percent of main chain dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGMC_NP/ANGMC_TN.
ANGSC1_PP denotes percent of chi1 dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGSC1_NP/ANGSC1_TN.
ANGSC2_PP denotes percent of chi2 dihedral angles it is possible to evaluate in
the submitted prediction, i.e. ANGSC2_NP/ANGSC2_TN.
ANGMC_PC denotes percent of main chain dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGMC_NP.
ANGSC1_PC denotes percent of chi1 dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGSC1_NP.
ANGSC2_PC denotes percent of chi2 dihedral angles correct, i.e. with error
smaller than cutoff (30 degrees) and relative to ANGSC2_NP.
Comments:
1. Dihedral angles are calculated in degrees and belong to the interval
[-180, 180].
2. Only the angles calculated based on the atoms provided in the target
structure are included.
3. For each subset RMS is calculated by formula: SQRT(SUM(z*z)/n)
4. Dihedral angles are calculated provided that all four atoms involved
fall into a given subset. An exception is made for main chain angles
(phi and psi for subset different than ALL) for which first or fourth
atom of the dihedral angle set does not have to belong to a given subset.
5. Calculating chi2 and chi3 angles "swapping" can be considered (optional).
It means that in amino acids where atom names can be switched, i.e.
for chi2 in amino acids PHE: CD1 <-> CD2
TYR: CD1 <-> CD2
ASP: OD1 <-> OD2
for chi3 in amino acid GLU: OE1 <-> OE2
angular rms is calculated with an option to take the value more
favorable for the predictor. Atoms CD1 and CD2 in PHE and TYR, as well
as atoms OD1 and OD2 in ASP, OE1 and OE2 in GLU are exchanged.
6. Calculating chi angles the following residues are considered:
for chi1: VAL, LEU, ILE, PRO, MET, PHE, TRP, SER, THR,
CYS, TYR, ASN, GLN, ASP, GLU, LYS, ARG, HIS
for chi2: LEU, ILE, PRO, MET, PHE, TRP,
TYR, ASN, GLN, ASP, GLU, LYS, ARG, HIS
for chi3: MET,
GLN, GLU, LYS, ARG
for chi4: LYS, ARG
In a given subset:
ERRCA_A denotes absolute error estimates (|D-E|) for C-alpha atoms.
ERRCA_R denotes relative error estimates (|D-E|/|D+E|) for C-alpha atoms.
ERRMC_A denotes absolute error estimates (|D-E|) for main chain and C-beta atoms.
ERRMC_R denotes relative error estimates (|D-E|/|D+E|) for main chain and C-beta atoms.
ERRSC_A denotes absolute error estimates (|D-E|) for side chain atoms.
ERRSC_R denotes relative error estimates (|D-E|/|D+E|) for side chain atoms.
ERRALL_A denotes absolute error estimates (|D-E|) for all atoms.
ERRALL_R denotes relative error estimates (|D-E|/|D+E|) for all atoms.
ATOMCA_NP denotes the number of CA atoms in the submitted prediction.
ATOMMC_NP denotes the number of main chain and C-beta atoms in the submitted prediction.
ATOMSC_NP denotes the number of side chain atoms in the submitted prediction.
ATOMALL_NP denotes the number of all atoms in the submitted prediction.
ATOMCA_TN denotes the total number of CA atoms in the target structure.
ATOMMC_TN denotes the total number of main chain and C-beta atoms in the target structure.
ATOMSC_TN denotes the total number of side chain atoms in the target structure.
ATOMALL_TN denotes the total number of all atoms in the target structure.
ATOMCA_PP denotes percent of CA atoms it is possible to evaluate in the
submitted prediction, i.e. ATOMCA_NP/ATOMCA_TN
ATOMMC_PP denotes percent of main chain and C-beta atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMMC_NP/ATOMMC_TN
ATOMSC_PP denotes percent of side chain atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMSC_NP/ATOMSC_TN
ATOMALL_PP denotes percent of all atoms it is possible to evaluate
in the submitted prediction, i.e. ATOMALL_NP/ATOMALL_TN
Comments:
Accuracy of estimates of atomic coordinate errors between model and target
is calculated using the following formulas:
|D-E| mean deviation between observed distance in atomic
positions (D) and the estimated one (E) in Angstroms:
1 SUM
|D-E| = --- SUM |D(i)-E(i)|
N SUM
Where D(i) is the distance between atoms (model - target)
and E(i) the predictor provided estimate for specified
atom. The sum is over all atoms in given subset.
|D-E|/|D+E| mean value of the normalized deviation:
1 SUM |D(i)-E(i)|
|D-E|/|D+E| = --- SUM -------------
N SUM |D(i)|+|E(i)|
Since the following is true for each atom:
0 <= |D(i)-E(i)| <= |D(i)|+|E(i)|
this measure approaches 0 for correctly estimated errors
and 1 for wrong error judgments.
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
Reference numbers: NP TN PP
Comments:
1. Inclusion is evaluated per atom
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
Reference numbers: NP TN PP
from residue HIS: atoms ND1 CD2 CE1 NE2 angle chi2 from residue ASN: atoms OD1 ND2 angle chi2 from residue GLN: atoms OE1 NE2 angle chi3 from residue VAL: atoms CG1 CG2 angle chi1 from residue LEU: atoms CD1 CD2 angle chi2
Comments:
1. Above atoms and dihedral angles are excluded.
2. Measures:
angular rms: ARMSSC
cartesian rms: CRMSSC
ERRORS: ERRSC
Reference numbers: NP TN PP PC
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
Reference numbers: NP TN PP
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
Reference numbers: NP TN PP
Comments:
1. Inclusion is evaluated per residue
2. Measures:
angular rms: ARMSMC ARMSSC
cartesian rms: CRMSCA CRMSMC CRMSSC CRMSALL
ERRORS: ERRCA ERRMC ERRSC ERRALL
Reference numbers: NP TN PP
Using target<-->model global superposition in a given subset of distance cutoff DIST:
DISTCA_N denotes the number of CA atom pairs for which distance = |target-model| < DIST DISTCA_P denotes the percent of CA atom pairs for which distance = |target-model| < DIST DISTALL_N denotes the number of ALL atom pairs for which distance = |target-model| < DIST DISTALL_P denotes the percent of ALL atom pairs for which distance = |target-model| < DISTThe following subsets of distance cutoff DIST are defined:
LCS LONGEST_CONTINUOUS_SEGMENT
(Longest continuous sequence under CA RMS cutoff 1.00)
LCS_TS Longest Continuous Segment under CA RMS cutoff <= 1.0 (TOTAL_SCORE)
GDT GLOBAL_DISTANCE_TEST
(the largest set of residues under DISTANCE_CUTOFF (using LCS results))
GDT_N_O-n Estimation of the largest set of residues under distance cutoff <= n.0
GDT_P_O-n Estimation of the percent of residues under distance cutoff <= n.0
GDT_LR-n Local RMS on the set of residues under distance cutoff <= n.0
GDT_TS GDT_TOTAL_SCORE = (GDT_P_O-1 + GDT_P_O-2 + GDT_P_O-4 + GDT_P_O-8)/4.0
LCS and GDT (detailed description)
The number of residues in the model for which the closest residue in the target is correct one, and the distance between them is less than 3.8 Angstroms is reported. This gives the number of correctly aligned residues.
The number of positions for which the model residue is closest to a residue in the target within +/-4 residues, and the distance is less than 3.8 Angstroms is reported. This gives the number aligned within 4 residues of the correct value.
ALIGN_A_N Number of a.a. aligned exactly ALIGN_A_P Percent of a.a. aligned exactly ALIGN_A4_N Number of a.a. aligned within +/-4 sequence window ALIGN_A4_P Percent of a.a. aligned within +/-4 sequence window