D.J. Osguthorpe
Molecular Graphics Unit, University of Bath, Claverton Down, Bath, BA2 7AY

Ab Initio Protein Folding

The protein folding methods I am using have started from a simplified model of protein structure with potentials developed to reproduce the physical behaviour of atoms rather than from statistical analysis of the database. This is a fully flexible model for folding as molecular dynamics is being used as the conformational space search technique. It is also an evolving model in that changes will be made when it appears the model cannot reproduce protein structures. Currently, this model does not involve backbone hydrogen bonding groups, it uses the classical virtual C-alpha bond approximation, with an atom positioned on each C-alpha atom. The side chain representation is also simplified but has evolved from a single atom centre per side chain to the current 1 to 3 atom model. An early decision was to retain the direction of the C-alpha - C-beta bond, as this simplified the generation of internal potentials around this bond.

The potentials are currently based on 4 major components, internal potentials (bond lengths, valence angles, torsion angles and out of planes), non-bond potentials, surface area solvation potentials and helix/sheet potentials. The basic internal potential around the C-alpha atom is based on a full-flexible phi,psi map of a 2 residue peptide, plus analysing the geometry of known proteins. The potential forms vary from simple harmonic terms to quartic polynomials and sums of gaussian functions. In analysing known protein structures only C-alpha's which were NOT assigned the main secondary structures by the DSSP algorithm were included in geometry histograms, to remove the bias of the essentially fixed geometry of the main secondary structures. The out of plane potential was used to constrain the geometry around the C-alpha to the appropriate non-planar geometry for L amino acids. The nonbond potential is a Lennard-Jones 10-6 potential, with parameters based on two factors, the energetics of the approach of full atom side chains to each other for like pairs, using a full atom non-bond potential plus partial charges, and least square fitting of the r* radius to minimise the deviation of the coordinates of 4 known proteins from their near X-ray positions. The near X-ray positions were generated by template forcing runs from 0.1 A RMS to 0.3 A RMS to reduce initial clashes due to the conversion to the model representation.

The solvent potential is based on the first shell relative surface accessability (RSA) but it includes a linear function of RSA and RSA raised to selected powers. In all cases the solvent potential favours the exposure of the atom grouping, the difference between charged and "hydrophobic" amino acids being hte power function which determines the rate at which the energy goes to 0 as RSA goes to 0. The energy value for the potential is taken from partial atomic energy values for the non-bond energy such that the solvent energy will roughly counteract this energy.

The helix/sheet potentials are based on energy functions which stabilise the distances and orientations between C-alpha atoms in the helix and sheet geometry, using gaussian functions of distance differences which vary from 0 to 1 when the distances exactly match. There is no sequence dependence in this potential, i.e. it is the same for all C-alpha which can form 2 hydrogen bonds, Pro being treated differently as it cannot form certain hydrogen bonds.

So far, the potential seems to be able to reproduce the geometry and secondary structures of known proteins i.e. proteins do not move rapidly away from the X-ray structure at low "temperatures". However, folding simulations lead to non-native collapsed structures which have similar amounts of secondary structure energy but too buried "hydrophobic" residues. The folding protocol currently follows a simple start from extended structure at 600 K and cool gradually, with extended periods at a fixed temperature after 500 K is reached at every 50 K, i.e. 500, 450, 400 etc. With an infinite non-bond cutoff proteins collapse at 450 K or so. Using a cutoff of 9.5 A leads to much less compact structures with a significant increase in the amount of secondary structure formed. The collapse in this case is at much lower temperatures, 350-300 K, and even the "collapsed" structures are larger than those with an infinite cutoff.

These problems are demonstrated in 3 folds of the staufen protein, one with an infinite cutoff from extended chain, two with a 9.5 A cutoff, one from extended and one from an all helical conformation. The extended runs started from a temperature of 600K, the helical run involved a cool to 0.1 K followed by a heat up to 400 K and then cooling.

Asilomar Conference home page