Discussion: Data-assisted category assessments

Discussion: Data-assisted category assessments

Postby kfidelis on Thu Jun 28, 2018 5:17 am

Evaluation measures and procedures used in the assessments. Sections describing plans for CASP13 (DRAFT currently open for discussion) were written by the CASP13 assessors in this category.

Many of the model accuracy assessment techniques used in the data assisted modeling assessments have already been described in the documents dedicated to the template-based and template-free assessments. These are available in separate postings on this site and should be consulted for the relevant details.

Specific measures and procedures used in CASP12

Improvement or decline in model accuracy relative to non-assisted models was measured using the GDT_TS as the main matric. The GDT_TS measure was supplemented by the QCS, Handedness, CoDM, DFM, and TM-align scores. Ranking was performed using multiple scores and score combinations (see also ranking procedures described in the last section of the template-free posting).

Assessment of data-assisted prediction by inclusion of crosslinking/mass-spectrometry and small angle X-ray scattering data in the 12th Critical Assessment of protein Structure Prediction experiment
Giorgio E. Tamò, Luciano A. Abriata, Giulia Fonti, Matteo Dal Peraro. Proteins. 2018 Mar;86 Suppl 1:215-227. doi: 10.1002/prot.25442.

Specific measures and procedures used in CASP11

Improvement or decline in model accuracy relative to non-assisted models was measured using GDT_TS as the main matric. T-tests comparing all assisted and all non-assisted models were also used. One-tailed paired t-tests were used to evaluate the significance of each group’s improvements in performance over their un-assisted models.

In addition to GDT_TS, measures used in the CASP11 FM assessment were also employed:

GDT_TS, QCS, LDDT, and MolProbity, as well as TenS and ContS

As well as the “TBM-style” combined score including:

GDT_HA, GDC_ALL, LDDT, SG-score, and 0.2*MolProbity score

Models were ranked using the Z-score analysis, statistical tests, and head-to-head comparisons.

In addition, models generated using simulated contacts mimicking sparse NMR data were compared to those obtained with the standard CNS package (see reference paper for details).

Assessment of CASP11 contactas-assisted predictions
Lisa N. Kinch, Wenlin Li, Bohdan Monastyrskyy, Andriy Kryshtafovych, Nick V. Grishin. Proteins. 2016 Sep;84 Suppl 1:164-180. doi: 10.1002/prot.25020.

For simulated NMR data:
Models will be assessed against experimental reference stuctures (generally X-ray crystal structures) with “TBM-style” combined score including (at least):
GDT_HA, GDC_ALL, LDDT, and MolProbity score

Predicted models will also be compared with models generated from the same Ambiguous Contact Lists using the ASDP NMR structure generation software, as described in:
Huang, Y.J.; Powers, R.; Montelione, G.T. J. Amer. Chem. Soc. 2005, 127: 1665 - 1674. Protein NMR Recall, Precision, and F-measure scores (RPF Scores): Structure quality assessment measures based on information retrieval statistics.

Huang, Y.P.; Mao, B.; Xu, F.; Montelione, G.T. J. Biomol. NMR 2015, 62: 439 - 451. Guiding automated NMR structure determination using a global optimization metric, the NMR DP score.
Ideally, predictors will provide models that are more accurate than those generated with these standard NMR data analysis tools.

Models will also be assessed against the corresponding simulated NOESY Peak Lists using the RPF-DP Score, as described in:
Huang, Y-J.; Tejero, R.; Powers, R.; Montelione, G.T. PROTEINS: Struct. Funct. Bioinformatics 2006 62: 587 - 603. A topology- constrained distance network algorithm for protein structure determination from NOESY data.

Huang, Y.; Rosato, A.; Singh, G.; Montelione, G.T. Nucleic Acids Research 2012, 40: W542 - 546. RPF – A quality assessment tool for protein NMR structures.

Rosato, A.; Vranken, W.; Fogh, R.H.; Ragan, T.J.; Tejero, R.; Pederson, K.; Lee, H.-W.; Prestegard, J.; Yee, A.; Wu, B.; Lemak, A.; Houliston, S.; Arrowsmith, C.; Kennedy, M.; Acton, T.B.; Liu, G.; Xiao, R.; Montelione, G.T.; Vuister, G.W. J. Biomol. NMR 2015, 62: 413 - 424. The second round of critical assessment of automated structure determination of proteins by NMR: CASD-NMR-2013.

Models will be assessed against simulated RDC data using the RDC Q-Factor Metric, with the program REDCAT.

Valafar, H.; Prestegard, J.H. J Magn Reson. 2004 Apr;167(2):228-41. REDCAT: a residual dipolar coupling analysis tool.

For Predictions Guided by SAXS Data

As SAXS measures all electrons in the particle, including oligomerization states and/or disordered residues, the modelers have been provided the full sequence of the construct used in the SAXS data collection. Traditionally, evaluation has been based on comparing the portion of the model directly corresponding to the crystal structure. For those samples that are monomeric and well-ordered, the assessment could follow previous CASP assessments. However, for the more complex samples, there are several additional factors that need to be considered. See discussion in Ogorzalek et al (2018) on experimental factors arising in the data-assisted category for CASP12. 1) There is a rough probability that half of the targets will be multimerizing, based on previous experience by the SAXS beamline staff. Assessment will be done on the individual subunit and on the multimeric complex, comparing only the portion of the model actually modeled in the crystal structure. 2) There is the possibility that the crystallographic lattice has captured a conformation that is not the prevalent conformation in solution. Thus, there will be an additional assessment for how well the full model (containing the entire sequence) with the experimental SAXS data itself (see last section for details).

Additional measures that will be used in the assessment of SAXS-guided models in CASP13

1) All submitted models will be assessed against the crystal structure, using the range of tests available, including GDT-TS, LDDT, CAD, and SG. Comparison will be limited to only the sequence modeled in the crystal structure.
2) For targets whose SAXS data indicate multimerization, we will additionally compare the models with the multimeric state. To do this, we need the appropriate multimer for the assessment. To identify the proper multimer, we will compare all possible multimers of relevant stoichiometry found in the crystallographic symmetry against the experimental SAXS data.
3) For multimeric proteins where there are regions missing in the crystal structure, we will add back the missing regions to the crystallographic multimer model. We will use BilboMD to create an ensemble of disordered regions for each individual multimer. The ensemble of each individual multimer that best fits the experimental data will be used for the submitted model assessment, with assessment focused only on the portion of the sequence in the crystal structure.
4) For all submitted models, we will assess the submitted model against the experimental SAXS data. We anticipate that one or more of the targets may be in a different conformation in solution than in the crystallographic lattice formed artifactually under crystallization conditions. Thus, comparison to the experimental SAXS data may reveal submitted models that are more accurate in overall conformation than the crystal structure itself. We will assess models comparing to the reciprocal data using FOXS to generate the predicted SAXS curve and Volatility Ratio (Hura, Nature Methods, 2013; Brunette, Nature, 2015) and χ(2)free (Rambo, Nature, 2013) to compare predicted and experimental SAXS curves. We will also assess models in real space, using Autognom to generate the p(r) curves from the predicted SAXS curves. The exact statistical metric for the real space comparison is not yet determined.

Brunette TJ, Parmeggiani F, Huang PS, Bhabha G, Ekiert DC, Tsutakawa SE, Hura
GL, Tainer JA, Baker D. Exploring the repeat protein universe through
computational protein design. Nature. 2015 Dec 24;528(7583):580-4. doi:
10.1038/nature16162. Epub 2015 Dec 16. PubMed PMID: 26675729; PubMed Central
PMCID: PMC4845728.

Hura GL, Budworth H, Dyer KN, Rambo RP, Hammel M, McMurray CT, Tainer JA.
Comprehensive macromolecular conformations mapped by quantitative SAXS analyses.
Nat Methods. 2013 Jun;10(6):453-4. doi: 10.1038/nmeth.2453. PubMed PMID:
23624664; PubMed Central PMCID: PMC3728378.

Ogorzalek TL, Hura GL, Belsom A, Burnett KH, Kryshtafovych A, Tainer JA,
Rappsilber J, Tsutakawa SE, Fidelis K. Small angle X-ray scattering and
cross-linking for data assisted protein structure prediction in CASP 12 with
prospects for improved accuracy. Proteins. 2018 Mar;86 Suppl 1:202-214. doi:
10.1002/prot.25452. Epub 2018 Feb 7. PubMed PMID: 29314274.

Rambo RP, Tainer JA. Accurate assessment of mass, models and resolution by
small-angle scattering. Nature. 2013 Apr 25;496(7446):477-81. doi:
10.1038/nature12070. PubMed PMID: 23619693; PubMed Central PMCID: PMC3714217.
Posts: 3
Joined: Tue May 29, 2018 4:55 am

Return to CASP13

Who is online

Users browsing this forum: No registered users and 1 guest