Preliminary results

Re: Preliminary results

Postby guest on Sat Sep 06, 2008 11:23 am

Given a 100% accuracy of CA-trace, what else information a main chain H-bond can give you? I guess only
side-chain H-bond prediction is a relevent challenging problem that CASP needs to address this time or
in the future.
guest
 

Re: Preliminary results

Postby jianlin.cheng on Sat Sep 06, 2008 6:52 pm

Following the lead of the excellent assessments by several groups (Baker, Grishin, McGuffin, and Zhang), I'd like to share our preliminary evaluation of CASP8 tertiary structure predictions with the community: http://sysbio.rnet.missouri.edu/casp8_eva/index.html

--Jianlin
jianlin.cheng
 
Posts: 6
Joined: Sun Jul 20, 2008 8:39 pm

Re: Preliminary results

Postby test on Sat Sep 06, 2008 8:58 pm

Even if HB is considered in evaluation, it shouldn't have the same weight as GDT score or TMscore.
If the whole model is wrong, does it make sense to have good HB?

GDT_HA may be a suitable assessment score.

guest wrote:Given a 100% accuracy of CA-trace, what else information a main chain H-bond can give you? I guess only
side-chain H-bond prediction is a relevent challenging problem that CASP needs to address this time or
in the future.
test
 

Re: Preliminary results

Postby jlee on Sat Sep 06, 2008 10:18 pm

I will leave the job of assigning target proteins into HA-TBM/TBM/FM to the CASP8 assessors.
But, I ask the assessors to consider more microscopic criteria when evaluating TBM models.
Subtle differences are likely to be buried by coarse-grained measurements
and will be revealed only by fine-grained criteria.

test wrote:Well, the definition of TBM and FM are subjective instead of objective. How to implement what you suggested without introducing too much artificial bias?
In addition, GDT-TL may be too strict and is likely to bury some subtle difference.

jlee wrote:In the CASP7, to assess the quality of C_alpha trace, GDT-HA was used for both TBM and HA-TBM targets.
I would like to see the CASP8 assessors to use a even higher-accuracy measure such as GDT-TL (0.25, 0.5 1.0 2.0)
especially for HA-TBM tagets as done in the CAST6. Nowadays, protein model quality is improving steadily
especially for TBM targets, and CASP should ask/encourage predictors to devise more accurate modeling
globally (for FM targets) as well as locally (for TBM targets). I feel like 8A is too large a distance to be
meaningful even for FM targets (however, 8A gives us a complacent feeling of good protein modeling)

On the other hand, for the calculation of GDT scores, only positions of C_alpha atoms matter.
Since there are many more non-C_alpha atoms in protein models (CASP8 did not accept C_alpha only models),
CASP8 assessors should consider to include additional measures other than the HB score used in the CASP7.
Candidate measures include Chi_1 and Chi_12 for all/TBM targets.
One should also consider to use the HB score for all targets not restricted to TBM.

Just a thought....
jlee
 

Re: Preliminary results

Postby jlee on Sat Sep 06, 2008 10:33 pm

test wrote:Even if HB is considered in evaluation, it shouldn't have the same weight as GDT score or TMscore.
If the whole model is wrong, does it make sense to have good HB?

GDT_HA may be a suitable assessment score.

guest wrote:Given a 100% accuracy of CA-trace, what else information a main chain H-bond can give you? I guess only
side-chain H-bond prediction is a relevent challenging problem that CASP needs to address this time or
in the future.


If the whole model is wrong, it's HB score will be practically ZERO.

In my opinion, the whole point should be that (1) proteins contain much more atoms other than
the C_alpha atom (2) GDT scores depend only on the positions of C_alpha atoms.
I recomment that you read the CASP7 TBM assessment paper, where you will find a figure
illustrating the difference between a good GDT model and a good HB model.
jlee
 

Re: Preliminary results

Postby Guest on Sun Sep 07, 2008 1:05 am

jlee wrote:
test wrote:Even if HB is considered in evaluation, it shouldn't have the same weight as GDT score or TMscore.
If the whole model is wrong, does it make sense to have good HB?

GDT_HA may be a suitable assessment score.

guest wrote:Given a 100% accuracy of CA-trace, what else information a main chain H-bond can give you? I guess only
side-chain H-bond prediction is a relevent challenging problem that CASP needs to address this time or
in the future.


If the whole model is wrong, it's HB score will be practically ZERO.

In my opinion, the whole point should be that (1) proteins contain much more atoms other than
the C_alpha atom (2) GDT scores depend only on the positions of C_alpha atoms.
I recomment that you read the CASP7 TBM assessment paper, where you will find a figure
illustrating the difference between a good GDT model and a good HB model.


It is not true that "If the whole model is wrong, its HB score will be practically ZERO."

For an alpha-protein, it is easy to construct a model with a completely wrong
topology but with a good HB-score or chi1/chi2 score. T0465_LEE-SERVER_TS1
is one such example: TM-score of this model is 0.199, GDT-score is 0.180
and RMSD=17.9A, which are all close to random; but this model gets 50% of H-bonds
correct with a HB-score higher than most of others. I am not sure this kind of
HB-score is very meaningful.

If HB-score is combined with GDT/TM-score, it should have a lighter weight, e.g.
TM-score+HB-score*10%.
Guest
 

Re: Preliminary results

Postby djones on Sun Sep 07, 2008 4:13 am

I'm not sure I care for this recent fad of trying to use hydrogen bonds for model assessment.

It's such a comprehensively flawed concept, that I'm amazed we are still discussing it - but here are
some pertinent comments:

1. As someone has already pointed out, it is only useful for beta sheets - zero usefulness for all-alpha proteins. Even
in beta sheets it's no use for simple beta meanders where the same hydrogen bond pattern can be observed across
a wide range of sheet curvatures. Why use a method which can only be applied to a subset of protein fold types?
The argument should really just finish there, but to continue...

2. Hydrogen bonding is a complex quantum mechanical phenomenon - any purely geometric definition of a hydrogen
bond will be a crude approximation. Assuming we are not going to do semi-empirical quantum calculations, for example, which
crude approximation of a hydrogen bond do we opt to use? The old distance-based DSSP definition? Baker and Hubbard?
Dreiding/CHARMm potential? What cutoff do we set for the minimum energy permissible for a hydrogen bond? What about
steric hindrance, bifurcation or competition with surrounding solvent in accessible areas of the model?

3. What's so special about hydrogen bonds anyway? Why not also look at the similarity of accessible atomic surface area and that way
take the non-polar parts of the model into account? That could even be applied to all protein fold classes - not that I'm seriously
recommending this criterion, I hasten to add!

4. The only reason these hydrogen bond evaluation schemes have any perceived value is that they encompass geometric information
beyond the C-alpha trace. It's plainly daft to evaluate high resolution models on just C-alpha positions but why not just address that issue
directly rather than adding the fuzziness of hydrogen bond definitions into the mix? Use main chain RMSDs or even all-atom RMSDs if you want
more resolution than C-alphas can provide. A main chain atom RMSD of zero will by definition produce exactly the same main chain hydrogen bond list between two models (using simple geometric HB definitions at least). A C-alpha RMSD of zero will not necessarily produce the same main chain hydrogen bond list due to the inaccuracy inherent in building main chain coordinates from C-alpha traces.

In my view we should be replacing GDT-HA with geometric definitions based on both main chain and side chain atom distances not mixtures of C-alpha metrics combined with arbitrary hydrogen bond definitions.

For example, we could define something like this:

GDT(C-alpha / 2A cutoff) + GDT(C-alpha / 1A cutoff) + GDT(main chain / 0.5A cutoff) + GDT(side chain atoms / 0.5A cutoff)
---------------------------------------------------------------------------------------------------------------------------------------------------
4

This would produce a score that gives some credit for basic alignment accuracy (the C-alpha components), some credit
for main chain geometry (including main chain hydrogen bonds) and the last bit of credit for putting the side chain atoms in the
right places (which will even include side chain hydrogen bonding). Of course the selection of terms and distance-cutoffs is something that
could (and no doubt should) be tuned.
djones
 
Posts: 7
Joined: Sun Sep 07, 2008 3:21 am

Re: Preliminary results

Postby Guest on Sun Sep 07, 2008 5:20 am

djones wrote:I'm not sure I care for this recent fad of trying to use hydrogen bonds for model assessment.

It's such a comprehensively flawed concept, that I'm amazed we are still discussing it - but here are
some pertinent comments:

1. As someone has already pointed out, it is only useful for beta sheets - zero usefulness for all-alpha proteins. Even
in beta sheets it's no use for simple beta meanders where the same hydrogen bond pattern can be observed across
a wide range of sheet curvatures. Why use a method which can only be applied to a subset of protein fold types?
The argument should really just finish there, but to continue...

2. Hydrogen bonding is a complex quantum mechanical phenomenon - any purely geometric definition of a hydrogen
bond will be a crude approximation. Assuming we are not going to do semi-empirical quantum calculations, for example, which
crude approximation of a hydrogen bond do we opt to use? The old distance-based DSSP definition? Baker and Hubbard?
Dreiding/CHARMm potential? What cutoff do we set for the minimum energy permissible for a hydrogen bond? What about
steric hindrance, bifurcation or competition with surrounding solvent in accessible areas of the model?

3. What's so special about hydrogen bonds anyway? Why not also look at the similarity of accessible atomic surface area and that way
take the non-polar parts of the model into account? That could even be applied to all protein fold classes - not that I'm seriously
recommending this criterion, I hasten to add!

4. The only reason these hydrogen bond evaluation schemes have any perceived value is that they encompass geometric information
beyond the C-alpha trace. It's plainly daft to evaluate high resolution models on just C-alpha positions but why not just address that issue
directly rather than adding the fuzziness of hydrogen bond definitions into the mix? Use main chain RMSDs or even all-atom RMSDs if you want
more resolution than C-alphas can provide. A main chain atom RMSD of zero will by definition produce exactly the same main chain hydrogen bond list between two models (using simple geometric HB definitions at least). A C-alpha RMSD of zero will not necessarily produce the same main chain hydrogen bond list due to the inaccuracy inherent in building main chain coordinates from C-alpha traces.

In my view we should be replacing GDT-HA with geometric definitions based on both main chain and side chain atom distances not mixtures of C-alpha metrics combined with arbitrary hydrogen bond definitions.

For example, we could define something like this:

GDT(C-alpha / 2A cutoff) + GDT(C-alpha / 1A cutoff) + GDT(main chain / 0.5A cutoff) + GDT(side chain atoms / 0.5A cutoff)
---------------------------------------------------------------------------------------------------------------------------------------------------
4

This would produce a score that gives some credit for basic alignment accuracy (the C-alpha components), some credit
for main chain geometry (including main chain hydrogen bonds) and the last bit of credit for putting the side chain atoms in the
right places (which will even include side chain hydrogen bonding). Of course the selection of terms and distance-cutoffs is something that
could (and no doubt should) be tuned.


Good idea. But the cutoffs of 2/1/0.5 are too small. Credit should also be given to those with an error of 3A-4A or even 5A for the TBM models, because they are indeed different from an error of 6A or 8A. In your equation, errors in the region of 0.5A is over-counted. No matter you count C-alpha or main-chain or side-chain atoms, they are highly corrected.
Guest
 

Re: Preliminary results

Postby Guest on Sun Sep 07, 2008 5:22 am

Sorry, I meant "they are highly correlated".
Guest
 

Re: Preliminary results

Postby kevin_karplus on Sun Sep 07, 2008 10:52 am

Guest wrote:
guest wrote:
kevin_karplus wrote:I like Zhang's assessment by TM_score and HB_score (perhaps because it puts my server second, right behind Zhang's). It seems that Zhang fixed the problem in CASP7 of bad models built on good CA traces, if he is now doing best at the Hbonds.



I think HB_score should be used only on prediction of a "new category", say "H-bond prediction" , just like side-chain modeling, not on
traditional 3D-structure prediction which most focused on.


The CASP7 assessors ranked groups by HB and GDT. It's time for CASPs to set up a somewhat consistent criterion.


I disagree—I think we need new assessment methods that better distinguish good models from great models. GDT is a fine measure for the template-free models and for models that are not so great, but once the models start getting good (GDT > 85%, say) then ranking just based on the CA trace is sort of stupid. Getting the model right in traditional 3D modeling is the goal, and getting it right is not just getting the CA atoms in roughly the right places.

Correctness of hydrogen bonds is one measure that helps distinguish among good models. Other measures (all-atom RMSD, chi1 correctness, ... ) can also be applied. My former student, Firas Khatib, has come up with some new topological measures (based on slip knots) that are almost completely orthogonal to the GDT measure, but which usually distinguish experimental models from CASP models. They measure a property that none of us are getting right yet, but which is invisible to GDT. (Sorry, he hasn't made a program that can be used by any one but him yet—put pressure on David Baker, who has hired Firas as a postdoc.)
kevin_karplus
 
Posts: 7
Joined: Tue Jul 22, 2008 10:05 am

PreviousNext

Return to CASP Discussion

Who is online

Users browsing this forum: No registered users and 2 guests

cron