Adam Zemla Andrej Sali Bernd Kramer Daniel Fischer Francisco Melo Francisco Domingues Jaap Flohil Mansor Saqi Michael Braxenthaler Motonori Ota Robert Harrison Robert Brasseur Rajgopal Srinivasan Ross King Stan Galaktionov Steven R Ness Shane S.Sturrock
Adam Zemla Adam Zemla Albion E. Baucom Alexei Finkelstein Andrej Sali Arne Eloffson Arne Elofsson Azat Badretdinov Boris Reva Chris Bystroff David Shortle Dietlind Gerloff Dmitry Rykunov Enoch Huang Eugene Demchuck Francisco Melo Hannes Floeckner Ilya A Vakser Inna Dubchak Irene Weber Leonid Mirny Lihua Yu Liping Wei Matt J. Carlson Michael Bass Olivier Lichtarge Peter Avbelj Peter J Munson Ralf Zimmer Ram Samudrala Robert Brasseur Robert MacCallum Robin Munro Russ Altman Scott M. Le Grand Serge Batalov Stan Galaktionov Ursula Egner Valentina Di Francesco Vidna Epa Zhiping Weng
Sequence-Structure Profiles
A POSTER BY :
Arne Elofsson
Dep. of Biochemistry, University of Stockholm, 10691
Stockholm, Sweden
email : arne@biokemi.su.se
We have extended profile methods to detect protein folds
having structural similarity but low similarity to sequence
probes. These methods combine sequence substitution tables with
structural properties to form a combined profile. The structural
properties used in this study include distances between residues,
exposed areas, areas buried by polar atoms and properties of the
original 3D profile method. We compared the performance of these
combined profiles with different sequence matrices and with the
original 3D profile method. To overcome problems of finding the ideal
gap-penalties and weights used with these profiles, we used a genetic
algorithm to optimize these parameters. The performance of these
combined profiles is rigorously tested by cross validation using
independent test and training sets. These studies show that the
combined profiles perform better than profiles based on either
structural or sequence information alone.
A POSTER BY :
Thomas L. Huber and Andrew Torda
Research School of Chemistry, Australian National University,
Canberra, ACT 0200, Australia
email : Andrew.Torda@anu.edu.au
We have built low resolution force fields, optimised for
threading calculations. Parameter tuning was treated as
an optimisation problem and the force fields generated so as
to maximise the z-score of a set of more than 300 calibration
proteins.
We have used a simple clustering method to reduce the number
of parameters and can clearly show the trade-off between
generalisation and recall. After parametrisation on one set
of proteins, the method produces excellent results on a set of
proteins with low homology to the calibration proteins.
The parameter classification has also allowed us to show how
force field performance (fold recognition) deteriorates with
a decrease in the number of parameters.
Finally, we have built a force field which can be used with a
full Needleman and Wunsch algorithm for sequence to structure
alignments. This guarantees us O(n) (N-cubed) running time
rather than the more expensive double linear programming
approach. It also avoids problems with the "frozen
approximation" such as bias from the original sequence of
template structures or non-convergence of iterations.
These methods have been glued together in a Tcl interpreter
which provides a flexible method for scanning libraries of
folds and performing sequence to structure alignments and
searches.
A POSTER AND DEMONSTRATION BY :
A. Thomas-Soumarmon, N. Benhabiles and R.Brasseur
(1) INSERM U10, Hopital Bichat-Claude Bernard, Paris, France
(2) CBMN, Fac. Univ. Gembloux. Passage des Deportes, 5030 Gembloux,
Belgium
email : brasseur@fsagx.ac.be
We developed OSIRIS, a two step procedure to calculate 3D
structures of globular proteins from primary sequences
(Brasseur, 1995, J. Mol. Graph. 13, 312-322). In the first step,
21 to 124 couples of phi and psi are successively attributed to
all amino acids of the sequence and energies are calculated.
Then a molecule that combines the conformation of all amino acid
energy minima is made. It has most native secondary structures
and is further folded during OSIRIS second step, that is to say
an angular dynamics. The structure of minimal total energy is
saved. The empirical equation for energy involves classical
forces plus terms for internal and external solvation (Lins, L.
et al 1995, Faseb J. 9, 535-540). All atoms are explicitely
described and the energy between atoms i and j is distributed
onto the dihedral axes of the shortest pathway between i and j.
Several modifications of procedures were tested. In step 1,
instead of deciding the conformation of all amino acids after
one run of calculation, only those amino acids with low energy
were frozen. Then, calculations were repeated and the cutoff of
energy was algebrically increased at each repeat to stabilize
more and more amino acids. The calculation ended when all amino
acids were frozen. The procedure resulted in initially blocking
patches of sequence which then acted as nucleation centers since
the molecule structure lenghtened from there. This increased
calculation times but significantly improved the structures
especially for CaBP, APP, BPP, ubiquitine, cytochrome b256 and
uteroglobuline. Major improvements were obtained on turns that
were more closely mapped and calculated. As a consequence,
structures were more folded after the step 1 but still were not
tigthly packed demonstrating that the step 2 of OSIRIS,the
angular dynamics applied to the dihedral space was still
required.
Comparative results will be presented. In the meanwhile, in
order to optimize OSIRIS on a large set of proteins we are 1)
linking together 50 pentium-pro in a parallel network (parallel
processing) in order to optimize computing times, 2)
agglomerating calculated molecules via a non-hierarchical
clustering method in order to be able to objectively compare
them and, using this clustering approach, 3) testing different
equations of energies in order to define how they play on
structures. Preliminary results will be discussed.
Brutal Optimisation of Force Fields For Threading
OSIRIS: A computing procedure for ab-initio prediction of
protein structure
MODELLER
A DEMONSTRATION BY :
Andrej Sali
Rockefeller University, 1230 York Ave, New York, NY 10021
email : sali@rockvax.rockefeller.edu
Homology protein structure modeling by satisfaction of spatial restraints, as implemented in our program MODELLER (URL http://guitar.rockefeller.edu/), will be demonstrated. The program will be applied to the modeling of one protein sequence. First, the template structures will be identified. Next, an alignment of the target sequence with the template structures will be derived. The model will then be built and evaluated, which will be followed by re-alignment and re-building of the model. Finally, the model will be compared with the actual structure of the protein being predicted. The demonstration will emphasize several recent improvements of MODELLER, including improving the alignment between a sequence and a structure, identifying the accurate regions of an alignment without knowing both structures, more accurate modeling of insertions, and constructing a consensus model that is generally closer to the actual structure than any of the individual predictions.
A POSTER BY :
Roberto Sanchez, Richard Do, Julie Zikherman, Andrej Sali
Rockefeller University, 1230 York Ave, New York, NY 10021
email : sali@rockvax.rockefeller.edu
Our approach to comparative protein structure modeling based on satisfaction of spatial restraints will be outlined (Sali and Blundell, J.Mol.Biol. 234, 779, 1993). In the first stage of the method, the alignment between the sequence to be modelled and related template structures is obtained. In the second stage, restraints on various distances, angles, and dihedral angles in the sequence are derived from its alignment with the template structures. The restraints are expressed in the most general form as conditional probability density functions. And finally, the 3D model is obtained by minimizing violations of homology-derived and energy restraints, using conjugate gradients and molecular dynamics procedures. The derivation and satisfaction of spatial restraints have been implemented in the MODELLER program that is available at URL http://guitar.rockefeller.edu/. We will describe several new tools in MODELLER that allow improving the alignment between a sequence and a structure, identifying the accurate regions of an alignment without knowing both structures, more accurate modeling of insertions, and constructing a consensus model that is generally closer to the actual structure than any of the individual predictions. In addition, we will evaluate our predictions submitted to the CASP2 meeting.
A DEMONSTRATION BY :
Irene Weber and Robert W. Harrison
Department of Pharmacology, Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th St., Philadelphia PA 19107
email : harrison@asterix.jci.tju.edu
The molecular mechanics program, AMMP (1) has several unique features including stiffly stable molecular dynamics, extreme flexibility for both the molecular system and form of the potentials, extensive ability to analyze the molecular potential, and the ability to run without cutoff radii with an efficiency which is comparable to the use of an 8-10 Å cutoff. The absence of a cutoff radius has been shown to be critical for accurate prediction of binding energies and for the stability of the minimization (1; 2). The flexibility of AMMP means that it is easy to model a wide variety of molecules, not only proteins, and to improve the potentials. AMMP has been applied to build comparative models of proteins, including large protein-protein complexes of 15000 atoms (3-5). New methods have been introduced to generate atomic positions for new residues, insertions, and to use multiple related known structures in homology modeling. Distance potentials derived from multiple starting points can be used in AMMP to restrain structures to be similar to a common core structure. Table-driven contact potentials can be used to bring threading potentials into the molecular mechanics calculations. AMMP can both use 4-dimensional embedding and homotopy methods to 'unwrap' difficult starting problems. Both embedding and homotopy methods converge on problems where standard local methods like conjugate gradients completely fail. Density function theory has been used to allow AMMP to perform restricted SCF calculations of moderate accuracy in the context of a large molecular problem. AMMP runs on UNIX and WINDOWS95 platforms. Some of the targets for CASP2 were run completely on a WINDOS95 PC. Proteins as large as 6000 atoms can be easily modelled on a PC. The demonstration will cover the modeling of non-protein molecules, homology modeling of protein structure using several known structures simultaneously, and modeling protein-ligand complexes.
[1] Harrison, R.W. (1993) J. Comp. Chem. 14, 1112-1122.
[2] Kourinov, I. and Harrison, R.W. (1994) Nature Struct. Biol. 1, 735-743.
[3] Harrison, R.W. and Weber, I.T. (1994) Protein Eng. 7, 1353-1363.
[4] Harrison, R.W., Chatterjee, D. and Weber, I.T. (1995) Proteins: Struct. Funct. & Genet. 23, 463-471.
[5] Fu, Z.-Q, Harrison, R.W., Reed, C., Wu, J., Xue, Y.-N., Chen, M.-J. and Weber, I.T. (1995) Protein Eng. 8, 1233-1241.
A POSTER BY :
Robert W. Harrison, Charles Reed and Irene T. Weber
Department of Pharmacology, Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th St., Philadelphia PA 19107
email : weber@asterix.jci.tju.edu
Participation in CASP2 has enabled us to evaluate our current prediction procedure and test different improvements. We have predicted targets in both the comparative modeling and docking categories. Comparative modeling targets 1, 3, 9 and 17 and docking targets 13, 40 and 41 were predicted by molecular mechanics minimization with the program, AMMP. The minimization used all hydrogen atoms, all nonbond and electrostatic terms without distance cutoff, and an improved potential set.
Comparative modeling predictions start with manual positioning of insertions and deletions. Insertions and deletions are chosen to be at the junction of elements of regular secondary structure. With multiple structures the superimposed structures are displayed together and regions of high structural variablility are selected. When multiple possible starting structures are available the alignment is performed separately for each and the distances between conserved atoms are calculated. These distances are converted into pseudo-NOE distance restraints as either the union set of all distances, or the intersection set of the distances in common to all structures. The union set, the intersection set and the distances from the most similar starting model are then used with different weights as distance restraints on the model structure.
All new atoms are generated by minimization and the structure is minimized. The standard approach in AMMP is to use a hybrid Krylov method to build new atoms, followed by conjugate gradients and short runs of molecular dynamics. Targets 1 and 3 were minimized with 4-dimensional embedding. A new feature is that AMMP can apply flexible distance restraints to maintain the conformation of local secondary structure or conserved tertiary structure. The comparative modeling predictions applied distance restraints for common regions of multiple starting crystal structures for targets 1 and 3. In target 9 a 5 residue insertion was restrained to have helical conformation using ideal alpha helical distance restraints.
Three different solvent corrections were tested for targets 1 and 3: a correction that used an increased van der Waals radius for polar side chains, and two corrections using discrete water molecules in one or two shells around polar amino acids. These solvent corrections gave only minor improvements in the side chain conformations. Target 9 was predicted from only alpha carbon atoms as a test, and also from the complete structure 2cbp that was provided. Target 17 was predicted with the substrate glutathione and all waters from the parent structure. The docking targets were predicted by energy minimization using all waters and the ligand positioned manually. An automated docking procedure gave results no better than the manual positioning, since parent structures were available with similar ligands. Docking of trypsin inhibitors for targets 40 and 41 used the crystal structure of trypsin with pentamidine that was provided by us (Kourinov and Harrison) for target 33. Detailed comparisons of the predictions and the target crystal structures will be presented. Results will be compared to predictions of accuracy using pair potentials derived from the structural database.
A POSTER BY :
Boris A. Reva, Alexei V. Finkelstein, Michel F. Sanner, and Arthur J. Olson
(1) Department of Molecular Biology, The Scripps Research Institute, 10666 North Torrey Pines Road, Ca 92037, USA; (2) on leave from Institute of Mathematical Problems of Biology Russian Academy of Sciences, 142292, Pushchino, Moscow Region, Russian Federation; (3) Institute of Protein Research, Russian Academy of Sciences, 142292, Pushchino, Moscow Region, Russian Federation
email : mailto:breva@scripps.edu
We present two new sets of energy functions for protein structure recognition. The first set of potentials is based on the positions of alpha- and the second on positions of beta-carbon atoms of amino acid residues. The potentials are derived using a theory of Boltzmann-like statistics of protein structure by Finkelstein et al 1995, which gives a different definition for a reference state for long-range interactions than used in previous approaches. The energy terms incorporate both long-range interactions between residues remote along a chain and short-range interactions between near neighbors. Distance-dependence is approximated by a piecewise constant function defined on intervals of equal size. The size of this interval is optimized to preserve as much detail as possible without introducing excessive error due to limited statistics. A database of 222 non-homologous proteins was used both for the derivation of the potentials, and for the "threading" test originally suggested by Hendlich et al 1990. Special care is taken to avoid systematic error in this test. For threading, we used 102 non-homologous protein chains of 60 to 200 residues. The energy of each of the native structures was compared with the energy of 45 to 20 thousand alternative structures generated by threading. Of these 102 native structures 93 have the lowest energy with alpha-carbon-based potentials, and even more, all of these 102 structures, have the lowest energy with the beta-carbon-based potentials.
Key words: pairewise-residue potentials, protein structure recognition, Boltzmann-like statistics.
[1].Finkelstein, A., Badretdinov, A., Gutin, A. Proteins 23, 151, (1995)
[2].Hendlich, M., Lackner, P., Weitckus, S., Floeckner, H., Froschauer, R., Gottsbacher, K., Casari, G., Sippl.,M. J.Mol.Biol. 216, 167, (1990)
A POSTER BY :
Christopher Bystroff
University of Washington, Dept of Biochemistry, Seattle WA 98195-7350
email : bystroff@ben.bchem.washington.edu
Short amino acid sequence patterns (3 to 15 residues) have been identified which correlate strongly with 11 types of local structure in proteins. A new level of detail has been added to both the sequence patterns and to the structural paradigm for some new or poorly understood sequence/structure motifs such as a proline-terminated helix, two types of glycine-terminated helix, a hairpin turn with the sequence PDG, a diverging beta-turn containing the sequence PGD, and a serine-containing loop. Also identified were two sequence signatures for alpha-helix, two for beta-sheet, the SXXE alpha-helix capping box, an aspartate-containing beta-sheet kink, and a histidine-containing C-terminally frayed alpha-helix. Several amino acid frequency profiles have been identified for each structural motif, which are shown to predict the presence of the corresponding structural motif with varying levels of confidence. The "I-sites Library" is composed of 83 such sequence/structure motifs. In a pseudo-blind test of ten proteins of varying classes, the Library predicted the local structure of 72% of the residues with a confidence of at least 20%. Of those, 79% were correct within 90 degrees; 70% within 60 degrees. A total of 68% of the backbone dihedral angles were predicted to within 90 degrees of their true values; 61% within 60 degrees. The occurance of these motifs transcends protein fold topology and architecture. It is proposed that some of the motifs are protein folding initiation sites, because they fold into the same structure regardless of their non-local context in the sequence. However, such motifs may also have arisen by evolutionary maximization of stability, especially in cases where the structural environment of the motif is largely conserved (i.e. beta strands).
A POSTER BY :
Erik Wallin, Tomotake Tsukihara, Shinya Yoshikawa, Gunnar von Heijne and Arne Elofsson
Dep. of Biochemistry, Stockholm University, 106 91 Stockholm, Sweden
email : arne@biokemi.su.se
We have analyzed the structure of mitochondrial cytochrome c oxidase in terms of general characteristics thought to be important for describing the structural architecture of helix bundle membrane proteins. Many aspects of the structure are similar to what has previously been described for the photosynthetic reaction venter and bacteriorhodopsin, but significant differences also exist. Our results lead to a considerably more precise picture of membrane protein structure than has hitherto been possible to obtain. A discussion about the implications of this structure for protein structure prediction will be presented.
A POSTER BY :
Enoch S. Huang, Britt Park and Michael Levitt,
Stanford University,
email : eshuang@hyper.stanford.edu
Eighteen low and medium resolution empirical energy functions were tested for their ability to distinguish correct from incorrect folds from three test sets of decoy protein conformations. The energy functions included thirteen pairwise potentials of mean force, covering a wide range of functional forms and methods of parameterization, four potentials that attempt to detect properly formed hydrophobic cores, and one environment based potential. The first of the three test sets consists of large ensembles of plausible conformations for eight small proteins, all of which have correct native secondary structure and are reasonably compact. The second is the set of all subconformations in a database of known protein structures applied to the sequences in that database (ungapped threading). The third is a set of ensembles of 1000 conformations each for 5 small proteins taken from molecular dynamics simulations at 298K and 498K. Our results show that there are functions that are effective for each challenge set; moreover, success in one test is no guarantee of success in another. We examine the factors that seem to be important for accurate discrimination of correct structures in each of the test sets, and note that extremely simple functions are often as effective as more complex functions.
A DEMONSTRATION BY :
Daniel Fischer and David Eisenberg
MBI, UCLA
Email : fischer@mbi.ucla.edu
As part of our efforts to cope with the amount of information produced by the several genome sequencing projects, we have developed a Protein Fold Recognition and Genome Analysis Automated Server (FRAGAS). This server incorporates our newly developed methods for protein fold recognition - the computational assignment of newly determined amino acid sequences to three-dimensional protein structures. In addition, it automatically compiles the results of other sequence analysis methods. FRAGAS is a package providing users with computation time, storage and collection of data, and organization of the results for easy analysis. The server now provides single predictions and developments for entire genome projects are under way. The main features of FRAGAS are: Single Entry. Sequences are entered only once in one single site. Protein Fold Recognition (Threading). Executes and reports the results of various fold-recogntion methods, with confidence levels. Sequence Analysis. Each fold-recognition prediction is annotated with the results of other sequence analysis and prediction methods. Data Organization. The results of the various methods are organized in an html format. The user obtains an html page with the results of the various methods, and not just as a list of links to the methods. The user tailors his/her needs only once (or uses the default settings) and he/she automatically obtains the actual results of all the selected methods. There is no need to compile results separately from additional www accesses or mail messages. Data Storage. The results are physically stored in our server site. This means that a user can access the information at any time after the submission of the sequence, without having to repeat the process. In addition, the user may request the html results files and install them locally in his/her computer. If the user chooses not to have his/her results publicly available, then the privacy of his/her sequence is maintained and the results are made available only to him/her. Validation. With time, as the structures of the submitted sequences become available, the server could compile statistics of the performance of the various methods. With these statistics, reliability measures of each method will be computed and made public. Any fold-recognition method which provides an automatic server can be linked directly to FRAGAS so the user obtains the results of the prediction automatically without additional accesses. FRAGAS can thus serve as an automated evaluator of various automated fold-recognition methods. Submission of amino sequences are welcome at the url: http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html.
A DEMONSTRATION BY :
Ross D. King Mansoor Saqi Michael J.E. Sternberg
Imperial Cancer Research Fund
rd_king@icrf.icnet.uk
DSC Protein Secondary Structure Prediction I am pleased to announce the opening of a new web site for the prediction of protein secondary structure. Two prediction modes are available: 1) Given a single sequence. A multiple sequence alignmentt will be formed and DSC used to predict secondary structure. http://www.icnet.uk/bmm/dsc/dsc_form_align.html 2) Given a multiple sequence alignmnet. DSC will use this alignment to predict secondary structure. http://www.icnet.uk/bmm/dsc/dsc_read_align.html The advantages of DSC are: 1) It is very accurate. DSC has a prediction accuracy of 70.1% on a standard set of 126 proteins. This was not significantly different from PHD, a popular prediction method. For medium length sequences DSC was more accurate than PHD, and combining DSC and PHD produced a prediction method more accurate than either. 2) It is free. There is no charge for using DSC. 3) The C source code is available. This would allow you to run DSC on your own system if you had confidential sequences. DSC is based on simple linear statistics. A paper on the scientific basis of DSC will appear in Protein Science: "Identification and Application of the Concepts Important for Accurate and Reliable Protein Secondary Structure Prediction"
A POSTER BY :
Ursula Egner and Stefan Schroeder
Research Laboratories of Schering AG
email : ursula.egner@schering.de
Participation in the CASP2 experiment provides an opportunity to examine and critically evaluate our modeling software (INSIGHT/ DISCOVER, MSI Inc.; COMPOSER/SYBYL, Tripos Ass.). The targets were chosen to span a range of sequence homology to the template proteins and with regard to the completeness of available crystal structure data for the modeling process. Proteins T0001, T0007 and T0009 seemed to us the appropriate targets to fulfill these requirements.
The prediction of target T0001, dihydrofolate reductase (DHFR) from haloferax volcanii comprising 162 aar, was based on three homologous DHFR structures (3DFR, 4DFR, 8DFR, resolution 0.17 nm) showing slightly less than 30 % identical amino acid residues to each of the three DHFR's. A preliminary model was obtained with COMPOSER, some loops were generated with INSIGHT. This model, as checked with PROCHECK, showed considerable deviations in bond length and angles from ideal values.
For Target T0009, cucumber stellacyanin comprised of 109 amino acid residues, only one homologous crystal structure was found in the protein databank, the Type 1 copper protein of cucumber (cucumber basic protein, 1CBP, resolution 0.25 nm) showing a homology of 33 % identical amino acids. Model building was aggravated as only the C-alpha atoms of the protein are deposited in the protein databank. The complete backbone coordinates of this target were build with the BIOPOLYMER module of SYBYL and used as the reference structure. The model building with COMPOSER was based on a sequence alignment for stellacyanin from Rhus vernicefera (B.A. Fields, J.M. Guss, H.C. Freeman J.Mol.Biol. 222, 1053-1065 (1991)) trying to conserve secondary structural elements. All loops were build with the facilities provided by SYBYL and the resulting model was analyzed using the PROTABLE module. The essential copper atom in stellacyanin was included in the DISCOVER force field for further refinement of the model.
The modeling of the third target (T0007, bovine neurocalcin delta, 193 amino acids) was based on the highly homologous structure of bovine recoverin (1REC, 50 % identical amino acid residues, resolution 0.19 nm). According to PROCHECK, the starting model was of good quality except for two loop regions, where other protein structures had to be taken into account for the model.
All model structures were subjected to an energy minimization and molecular dynamics calculation (DISOCVER 2.95/2.97 with the CVFF and CFF91 force field). At the beginning of the simulation, the C-alpha coordinates of structurally conserved regions were tethered. Wherever possible, conserved water molecules as determined from the crystal structure of the template proteins were included in the calculations. The models were soaked with a 0.8 nm water layer (1.2 nm for target T0007), tethering the outermost 0.2 nm layer of water molecules. PROCHECK evaluation of the energy minimized structures revealed a considerable deviation of the angle omega of up to 20 degrees from planarity although this angle was forced to the trans conformation in the calculations. In contrast to the recommendations of the DISCOVER handbook, the force constants for keeping the omega angle deviation less than 10 degrees, had to be enlarged considerably. If the simulation is performed with the CFF91 force field, the deviations in omega are less than those based on the CVFF force field. As soon as the crystal structures of the targets are available, the modeling procedure will be analyzed in order to identify pitfalls in the simulation protocols and improve the quality of future models.
A POSTER BY :
Leonid Mirny and Eugene Shakhnovich,
Harvard University, Cambridge, MA,
email : leonid@diamond.harvard.edu
We introduce a novel method of deriving a pairwise potential for protein folding. The potential is obtained by an optimization procedure that simultaneously maximizes thermodynamic stability for all proteins in the database.
When applied to the representative dataset of proteins and with the energy function taken in pairwise contact approximation, our potential scored somewhat better than existing ones. However, the discrimination of the native structure from decoys is still not strong enough to make the potential useful for ab initio folding. Our results suggest that the problem lies with pairwise aminoacid contact approximation and/or simplified presentation of proteins rather than with the derivation of potential. We argue that more detail of protein structure and energetics should be taken into account to achieve better energy gaps. The suggested method is general enough to allow to systematically derive parameters for more sophisticated energy functions. The internal control of validity for the potential derived by our method, is convergence to a unique solution upon addition of new proteins to the database. The method is tested on simple model systems where sequences are designed, using the preset ``true'' potential, to have low energy in a dataset of structures. Our procedure is able to recover the potential with correlation r =90we were able to fold all model structures using the recovered potential. Other statistical knowledge-based approaches were tested using this model and the results indicate that they also can recover the ``true'' potential with high degree of accuracy.
A POSTER BY :
Albion Baucom, Melissa Cline, David Haussler, Lydia M. Gregoret
University of California Santa Cruz
email : baucom@gorby.ucsc.edu
The mechanism by which sequentially distant amino acids find one another to form beta-sheets in the native structure of proteins is still poorly understood. It is not known to what degree side chain - side chain interactions between residues on neighboring beta-strands are specific. The ability to predict beta-sheet topology would be a great leap towards solving the protein folding problem. Using a database of 344 protein structures from the Protein Data Bank, we have trained a neural network to predict anti-parallel beta-strand pairing in proteins. The neural network considers windows of five neighboring residue pairs simultaneously. Prediction results generated by the neural network using only the names of amino acid residues, and their neighbors, provide enough information to begin predictions. In an effort to improve prediction results, a list of structural and biochemical features of amino acids have been constructed from the database and used as supplementary information for the neural network's training set. These features include hydrophobicity, charge, beta-strand propensity, etc. Using these features, we are able to correctly predict true strand pair examples 66% of the time. If we also include strand membership information, our success improves to 80%. We are currently in the process of extending our method to predict entire beta-sheet topologies.
A POSTER BY:
Olivier Lichtarge
University of California, San Francisco
email : licht@cmpharm.ucsf.edu
Even when a protein structure is known, it remains a major theoretical and experimental challenge to localize its functional surfaces and understand the role of their constituent residues. One approach is to tease from the evolutionary record those mutation patterns most likely to indicate functionally important sequence positions. The Evolutionary Trace, ET, accomplishes that goal by identifying patterns of residue conservation that correlate with functional divergence within a protein family. On the protein surface, positions where these ET patterns arise form clusters, precisely at binding and catalytic sites.
We will demonstrate evolutionary tracing in SH2 and SH3 modular signaling domains, G protein heterotrimers and DNA binding domains from intracellular hormone receptors. In these proteins,
functions within subgroups of related proteins.
Most generally, ET is a systematic, transparent and novel method that provides an evolutionary perspective on the functional or structural role of residues in a protein structure.
[1] Lichtarge O., Bourne H.R., Cohen F.E., The Evolutionary Trace Method Defines the Binding Surfaces Common to a Protein Family. Journal of Molecular Biology 257:342-358 (1996).
[2] Lichtarge O., Bourne H.R., Cohen F.E., Evolutionarily Conserved Gabg Binding Surfaces Support a Model of the G Protein-Receptor Complex. Proc. Natl. Acad. Sci. U.S.A. 96:7507-7511 (1996).
[3] Sheikh S., Zvyaga T.A., Lichtarge O., Sakmar T.P., Bourne, H. R., Engineered Metal Ion Binding Sites Linking Transmembrane Helices C and F in Rhodopsin Block Transducin Activation. Nature 383:347-350 (1996)
A POSTER BY :
Zhiping Weng, Sandor Vajda and Charles DeLisi,
Boston University
email : zhiping@mendel.bu.edu
The lack of an effective target function remains a major obstacle to achieving generally accurate docking procedures. Traditional target functions such as interaction energy will be replaced by relatively complete and easily executable free energy functions. The functions are based on a molecular mechanics potential and include empirical terms representing solvation and entropic distributions. Results show that the use of free energy as target function increases the reliability and accuracy of the structures of protein-protein complexes predicted from the structures of monomers. The binding free energies may also be predicted with impressive accuracy in test cases in which the complex is separated, and the resulting monomer structures are used in docking. However, the predicted free energies generally exceed the observed values by several kcal/mol if the docking is based on the structures of the free monomers. Some procedures for avoiding this difficulty will be discussed.
A POSTER BY :
D.S. Rykunov, M.Y. Lobanov, B.A. Reva and A.V. Finkelstein
(1) Institute of Theoretical & Experimental Biophysics, Pushchino, Moscow Reg., Russia (2) Institute of Protein Research, Pushchino, Moscow Reg., Russia (3) Pushchino State University, Pushchino, Moscow Reg., Russia (4) The Scripps Research Institute, La Jolla, CA (USA) (5) Institute of Mathematical Problems of Biology, Pushchino, Moscow Reg., Russia
email : rykunov@sun.ipr.serpukhov.su
Background: Our aim is to apply threading approach to ab initio prediction of protein folds. As the target folds for treading, we use the sets of folding patterns that are partly observed and partly not observed in natural protein. Each set is obtained from some native protein core using (i) extrapolations of its secondary structure elements and (ii) all the reasonable modifications of the core topology, i.e. a variety chain pathways through this core. The computations are based on the self-consistent molecular field theory and on the statistical mechanics of chain molecules. The molecular field summarizes the action of long-range interactions. Statistical mechanics finds the stable state and fluctuations of the chain in this field.
Results: The developed computer program is used to single out the stable folds of many protein chains. The results obtained for different proteins are discussed.
Conclusions: The native fold is recognized as one of the stable, but rarely the most stable chain fold. The results obtained for the 'modified' topologies are not as good as the results obtained for the native ones. A plausible reason is that some structural details that help to recognize native folds are lost in the modified topologies. However, when the prediction is "averaged" over homologous chains, this facilitates a correct choice of the native protein fold.
A POSTER BY :
A.V. Finkelstein and A.Y. Badretdinov
Institute of Protein Research, Pushchino, Moscow Reg., Russia
email : afinkel@sun.ipr.serpukhov.su
Background: A question of how a protein chain can find its most stable structure without exhausting sorting out all its possible conformations is known as "Levinthal paradox". The answer to this question is rather important for the problem of ab initio prediction of protein structures. Our purpose is to elucidate the Levinthal's paradox.
Results: A stable globular structure can be rapidly achieved via that "nucleation and growth" folding pathway which provides a continuous entropy-by-energy compensation along the folding and therefore ensures a low free energy of the transition state. The folding is rapid when occurs in a vicinity of a thermodynamic "all-or-none" transition: here the mis- and semi-folded states cannot "trap" the folding since, event taken together, they are less stable than the initial coil as well as than the final stable fold of the chain.
Conclusions: Under the above conditions, an N-residue chain folds normally in exp(N) nsec. Thus, a 100-residue chain finds its most stable fold within minutes rather than in 10 psec ( 10 years, according to the famous paradoxical estimate of Levinthal.
A DEMONSTRATION BY :
Jaap A. Flohil, Tom de Hoop, and Edward E.E. Frietman
Delft University of Technology, Faculty of Applied Physics, Lorentzweg 1, 2628 CJ, Delft, The Netherlands
email : flohil@cp.tn.tudelft.nl
Abstract: A prediction system based on neural network technology.
The prediction system is trained on amino residues added with their corresponding secondary structure as an input array, which is matched with the target secondary structure during training. During prediction of each residue, the generated secondary structure is returned into the network by a feedback procedure.
The prediction system performs an interactive data exchange between two neural network systems. Predictions are starting from the N-terminal side. After completion of the sequence, secondary structure data is transferred to the network system which itself is trained on sequences in reversed order. The reverse predictor continues with the reversed sequence and reversed feedback structure array at the C-terminal side. The completed structure array is transferred to the forward predictor and the prediction process continues and repeats until self-consistency is achieved.
This method can be used to perform experiments with alternative expected structures and their possible influences on the folding process. Experiments have shown an increase of prediction accuracy if a proper prediction of secondary structure is implemented in the feedback procedure.
The CASP2 experiment : After collecting the target sequences, we obtained from the PHD secondary structure server [1] the assignments for the initial guess. If the PHD alignment contained sufficient information and the expected accuracy was about 72%, the subset of the prediction for which all residues with an expected accuracy > 82% was implemented in the feedback array. In case of lower expected accuracy from PHD, the threshold to accessing the feedback array has been increased by some percentage points. The corresponding structure positions in the feedback array have been left open for the positions in the PHD prediction with an expected accuracy < 82%. In these cases the feedback secondary structure was only generated by the prediction system.
Finally, the self-consistent structure is filtered by removing all predicted helix segments smaller than 4 residues. Unlikely structure patterns are patched to their closest pattern match in the structural database. The prediction system has been trained on a recent selection [2] of the DSSP database, in which the proteins have a sequence similarity < 25%.
[1] B. Rost PHD: predicting one-dimensional protein structure by profile based neural networks. Meth. in Enzym., 1996, 266, 525-539.
[2] U.Hobohm, M.Scharf, R.Schneider, C.Sander: "Selection of a representative set of structures from the Brookhaven Protein Data Bank" Protein Science 1 (1992),409-417.
A POSTER BY :
Eugene Demchuk, Donald Bashford, David A. Case
Folding of a Type VI Turn in Aqueous Solution
email : demchuk@scripps.edu
The combinatorial complexity of the protein folding problem increases exponentially with the number of monomers in the polypeptide chain and leads to a Levinthal's paradox which states that sorting of the full configurational space of a regular protein is unfeasible in real time. A paradigm of hierarchical self-organization of the polypeptide chain has been suggested to resolve the puzzle[1,2]. It postulates the formation of kinetic intermediates during the process of protein folding. They are viewed as a result of thermodynamic selection among local conformations stabilized presumably by short-range interactions in the polypeptide chain. Potential ability of short peptide sequences to form such intermediates recently has been shown experimentally[3]. Here we present a theoretical study of conformational dynamics and thermodynamics of a specific linear pentapeptide (AYPYD) which forms a type VI reverse b-turn in aqueous solution [4].
The conformational properties of the peptide have been explored in three solvated molecular dynamics simulations. The first began from an NMR-derived model structure containing a type VIa turn and close stacking interactions between the tyrosine and proline side chains. During 20 nsec of simulation, the peptide made transitions between type VIa and VIb turns, but did not "unfold" to more extended conformations, consistent with unusual stability for folded forms observed by NMR for this sequence. Distances monitored by nuclear Overhauser peaks and side-chain rotamer populations in the trajectory are in good agreement with NMR data.
Two additional 5 nsec trajectories were began from extended conformations which were selected on the bais of thermodynamic analysis (see below). The first folded into an NMR-like strucure within 3 nsec and remained folded. The second was begun from a structure in which the side chain orientations were deliberately mis-folded relative to that required for turn conformation; this structure did not make a transition to a turn-like state.
Thermodynamics of folding was assessed by a potential of mean force (pmf) consisiting of the AMBER/OPLS empirical potential energy function, a macroscopic electrostatic model of polar solvation and a surface area-based model of non-polar solavtion; conformational entropy was evaluated by a full systematic search of the dihedral angle space. We found that the gas-phase component strongly favors folding and was nearly cancelled by the polar solvation term which disfavored folding. The hydrophobic term is little comparing to the electrostatic solvation energy. It had no effect on folding.
Using random sampling in the conformational space of the polypeptide (to identify potential low-pmf regions), and molecular dynamics simulations (to generate Boltzmann ensembles), we calculated the Helmholtz free energy of folding. It is of 1-2 kcal/mol and agrees with the experimental number [4].
This modell study has several implications to the protein folding problem.
(1) It demonstartes that exact (or nearly exact) thermodynamic methods are adequate for depicting conformational preferences and folding pathways of polypeptide chain fragments.
(2) Solvation and entropic effects had profound implications on polypeptide folding. The solvated adiabatic potential energy surface is very different from the gas-phase one. This a result of polar solvation. The relative magnitude of hydrophobic effect is small.
(3) The net free energy of folding (the free energy change) is a tentative balance between large-scale energy terms. It consists < 10% of individual energy term change. It is unlikely that so small driving force of folding could be calculated by non-exact thermodynamic methods, i.e. a correct prediction of folded conformation without taking into account details of protein-protein and protein-solvent interactions may be problematic.
References:
[1] Karplus, M. & Weaver, D.L. (1994). Protein folding dynamics: the diffusion-collision model and experimental data. Protein Sci. 3, 650-668.
[2] Baldwin, R.L. (1994). Finding intermediates in protein folding. Bioessays 16, 207-210.
[3] Wright, P.E., Dyson, H.J., Feher, V.A., Tennant, L.L., Waltho, J.P., Lerner, R.A. & Case D.A. (1990). Folding of polypeptide fragments of proteins in aqueous solution. In Frontiers of NMR in Molecular Biology (Live, D., Armitage, I.M. & Patel, D., eds), pp. 1-13, A.R. Liss, NY.
[4] Yao, J., Feher, V.A., Espejo, B.F., Reymond, M.T. & Wright, P.E. (1994). Stabilization of a type VI turn in a family of linear peptides in water solution. J. Mol. Biol. 243, 736-753.
A DEMONSTRATION BY :
M.A.S Saqi, R.A. Sayle, R. Russell, P.A. Bates and M.J.E. Sternberg
Bioinformatics Group, GlaxoWellcome Medicines Research Centre
email : mass15599@ggr.co.uk
We present a new algorithm for fold recognition, FOLDFIT. The algorithm is based on matching sequence, predicted secondary structure and burial of a probe with the sequence, known and predicted secondary structure and the known and predicted burial of a set of templates. In FOLDFIT the matches are evaluated by the empirically derived substitution matrices for equivalenced analogous and homologous folds.
A POSTER BY :
R.M. MacCallum and J.M. Thornton
University College London
email : bob@biochemistry.ucl.ac.uk
We explore how the global composition of amino acids and various local sequence patterns can be used to predict the secondary structural class (i.e. all alpha, all beta, mixed) and composition (percent helix, percent strand).
As a reference point, the geometric method of Nakashima et al. (1986) using amino acid composition has been employed. When fully cross-validated on a complete set of non-homologous (< 25% sequence id) protein domain sequences taken from our CATH (Orengo et al.) classification of the PDB; the method classifies 57% of the 470 protein domains correctly.
We then show that the class prediction accuracy can be improved to 68-69% using the composition of local sequence words or n-tuplets. These results can only be obtained through the use of averaged composition profiles from sets of homologous sequences with greater than 160 residues in total (sparse sequence information is detrimental to prediction accuracy).
We are also able to rank the predictions using a novel confidence value; the best 50% of the predictions are 80% correct.
The prediction of secondary structural class is somewhat clouded by the arbitrary definition of the classes themselves. We therefore substantiate our findings by showing parallel improvements to the prediction of secondary structural content.
In summary, we have shown that the composition of local sequence-patterns contains more global structural information than just simple amino-acid composition. Amino-acid composition has been shown to improve linear secondary structure predictions (Rost and Sander, 1994). Higher dimensional n-tuplet composition data may yield further improvements to these methods.
[1] Nakashima, H, Nishikawa, K. & Ooi, T. (1986) J. Biochem. 99, 153-162.
[2] Orengo, C. A., Flores, T.P., Taylor, W.R., & Thornton, J.M. (1993) Prot. Eng. 6, 485-500.
[3] Rost, B. and Sander, C. (1994) Proteins. 19, 55-72.
A POSTER BY :
Michael B. Bass and Roland Luethy
Amgen, Inc.
email : mbass@amgen.com
Protein fold recognition, or "threading," was used to predict the fold of several of the CASP target sequences. The threading algorithm employed here uses the Needleman-Wunsch algorithm to produce global alignments without end penalties (Needleman and Wunsch, J. Mol. Biol. 48:443, 1970) . The substitution score is based on the three-dimensional properties of the protein structure. These properties are relative surface area, pairwise contacts between residues at least 5 amino acids apart in the protein sequence, and backbone dihedral conformation. Identification of a fold is based on a comparison of the target sequence with a database of unique protein structures (Hobohm, et al., Protein Sci. 1:409, 1992) . The scores are length-normalized. A prediction of protein structure is made when the sequence and structure have a Z-score greater than 5.0.
A POSTER BY :
V.C. Epa
Biomolecular Research Institute, 343 Royal Parade, Parkville, Vic. 3052, Australia
email : vepa@tigger.mel.dbe.csiro.au
The hemagglutinin-neuraminidase (HN) glycoprotein, which projects spike-like from the surface of all paramyxoviruses, (for example simian virus, Sendai virus, human parainfluenza virus, Newcastle disease virus, and mumps virus), shows neuraminidase activity. Furthermore, the compound 2-deoxy-2,3-dehydro-N-acetyl neuraminic acid shows similar inhibitory activity against both HN and influenza virus neuraminidase. This suggests that there is strong functional similarity of the active sites of these two enzymes and that they both probably use a similar transition state during the catalysis of the cleavage of sialic acid from glycoconjugates. On the other hand, convergent evolution can result in similar 3-dimensional clustering of active site residues being supported on protein backbones of dissimilar folds. The HN sequences do not show statistically significant sequence identity with either the influenza viral neuraminidases or the bacterial neuraminidases. Earlier work had identified conserved amino acids among HN sequences and proposed similarity between HN and influenza viral neuraminidase sequences.
In this work, we use a variety of procedures to identify the 3-dimensional fold or topology of the HN protein and develop a more detailed model, locating the individual secondary structural elements in the HN sequence. After recognizing the fold of the HN protein, i.e. identifying the known protein structures which have similar folds, obtaining an accurate alignment of the HN sequence with such known structures is an important aspect of the modeling problem that is addressed in this work. We examine some of the different protein structure prediction and modeling methods currently available, using the known structures of bacterial and viral neuraminidases as controls to evaluate their accuracy and efficacy in similar situations. The methods used in this work include discrete three-dimensional environmental classification, construction and database search with hidden Markov models, threading in three-dimensional space with knowledge-based potentials containing specific pair interactions, secondary structure prediction with neural networks using profiles from multiple sequence alignments, turn prediction with pattern matching, and hydropathy plots.
We find that the threading method identifies the fold of a major portion of the HN sequence studied as that of a 6 beta-sheet propellor form, similar to the structurally known influenza viral and bacterial neuraminidases. We assign the location of the individual beta-strands in the threaded model, using also supporting evidence from the neural net secondary structure prediction and the hydropathy profile. From our experience with the controls the global threading alignment was not expected to be accurate for a large protein like HN. In order to improve the alignment accuracy, HN subsequences were threaded onto appropriate substructures of the chosen template. Dependence of the threading fold recognition results on the values of gap penalties used was also investigated. While the threading algorithm performs fairly satisfactorily, discrete environmental classes and hidden Markov models were unsuccesful, seemingly not sensitive enough for the searches with the particular sets of very distantly related protein sequences that were studied here. The model developed in this work is of use to understand better the biological function of the HN protein and design inhibitors of the enzyme as well as to serve as an assessment of some protein structure prediction methods, especially after the forthcoming xray crystallographic solution of its structure.
A POSTER BY :
Robin E.J. Munro, Andras Aszodi and Willie R. Taylor
Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom
email : r-munro@nimr.mrc.ac.uk
Our predictions for the targets in the Second Meeting on the Critical Assessment of Techniques for Protein Structure Prediction were made using a variety of techniques - mainly fold recognition and homology modelling but also some ab initio prediction.
For fold recognition we used a multiple sequence threading (MST) method (Taylor, W.R., unpublished). A MULTAL multiple sequence alignment (Taylor, W.R. J. Mol. Evol. 28:161-16, 1988) of the target sequence and its homologues was constructed. This alignment was compared to an extended UCLA benchmark set of 319 structures using MST. The MST prediction uses a simple pairwise potential that favoures the packing of conserved hydrophobics into the core. The protein structure with the highest threading score was chosen. The threaded structure was also visualised to determine areas which had been deleted or inserted into, and to check the packing of hydrophobic residues. To prevent a dislocated threading across two protein domains a modified version of MST was used to thread just one domain of given size.
Homology modelling was carried out using DRAGON (Aszodi, A. and Taylor, W.R. Folding & Design 1:325-334, 1996). The algorithm is based on distance geometry and relies on decreasing the dimensions of the hierarchical projection of a simple model (C alpha and C beta) into 3D. The similarity between the unknown target structure and the scaffold proteins with known structures was described by mapping secondary structure assignments and specific distance restraints between C alpha atoms onto the model through a multiple alignment. The average clustered results of the simplified chains were calculated and this backbone was then modelled in QUANTA (Molecular Simulations Inc.) and CHARMM (MSI) to produce a full coordinate homology model.
Ab initio predictions consisted of a C alpha and C beta chain modelled using secondary structure predictions. Multiple conformations were created, using either DRAGON or combinatorial techniques, in an attempt to predict the fold (Taylor, W.R. Prot. Eng. 6:593-604, 1993). Where other constraints were used in the modelling the best structure, which did not violate any of the pre-defined constraints, was chosen.
Secondary structure prediction techniques used were the predict protein server (PHD - Rost, B., Sander, C. and Schneider, R. CABIOS 10:53-60, 1994) and laterly DSC (King, R.D. and Sternberg, M.J.E. Prot. Sci., in press) as an evaluation; other programs available on the web were used as an additional check. In all the cases a MULTAL alignment was studied and a secondary structure prediction evaluated `by eye'.
A POSTER BY :
Valentian Di Francesco, P. McQueen, J. Garnier, P.J. Munson
Laboratory of Structural Biology, NIH, Bethesda, MD, USA Scientific Computing Resource Center, NIH, Bethesda, MD, USA Fogarty International Center, NIH, Bethesda and Laboratoire de Biologie Cellulaire et Moliculaire, INRA, Jouy-en-Josas, France
email : valedf@helix.nih.gov
In order to detect proteins having similar fold even when their aligned sequences are less than 25% identical, several techniques for approaching the fold recognition problem have been recently developed. We present here a new approach which is based on sequences of secondary structures alone. A secondary structure prediction is performed on a sequence of amino acid residues producing a Markov chain based on a three letter code: H for helix, E for beta strand and C for aperiodic structure. This predicted sequence is aligned to hidden Markov models (HMM) of different protein topologies trained on experimentally derived secondary structure sequences of proteins having a specific topology. A total of 24 HMMs have been made for protein topologies in the four main structural classes of protein folds: all alpha, alpha+beta, alpha/beta and all beta. A protein topology recognition experiment is regarded as successful when the secondary structure sequence of a protein is ranked higher by the HMM of its own topology than by HMMs of other topology families. Protein fold recognition is achievable in 63% of the cases, with predicted secondary structure sequences. When the predicted secondary structure sequences are replaced with observed sequences the success rate of correct fold recognition increases to 79%. The success rate obtained when using observed sequences indicates that this approach will become increasingly useful as the accuracy of secondary structure prediction algorithms is improved.
We also show that these HMMs of protein topologies can be used to further improve the secondary structure prediction accuracy, when a set of probability or confidence values associated to the predicted states of the residues is provided by the prediction algorithm.
We will also present the analysis of the results of the predictions submitted to this year protein structure prediction contest in the protein fold recognition and ab initio categories.
A POSTER AND A DEMONSTRATION BY :
Stan Galaktionov and Garland Marshall
Washington University Center for Molecular Design 700 Euclid Ave., St. Louis, MO 63110
email : stas@ibc.wustl.edu
Demonstration description: The procedure starts with detection of the set of most probable contacts and subsequently applies to a current contact matrix "improving" operators bringing the matrix in conformity with the set of optimality criteria. The dynamics of process is shown by visual comparison of current contact matrix with correct one.
Poster description: The procedure starts with the predictions of secondary structure using methods of Gibrat(J. Mol. Biol., 120, 425, 1978), Maxfield(Bioch., 15, 5138, 1976), Qian(J. Mol. Biol., 202, 865, 1988) and King(Prot. Sci., in press). The consensus location of secondary structure was used for prediction of the coordination number vector (Rodionov, Mol. Biol., 26,777,1992). These data were used for the reconstruction of contact matrix (Galaktionov, Proc 27th Int. Conf. Syst. Sci., 5, 326,1994). The criteria of optimal organization of intraglobular contact network were formulated in terms of the relationship of a contact matrix with its powers and eigensystems and used in an iterative procedure of multicriterial optimization over the set of spatially consistent matrices. On the basis of the contact matrix, in turn, a 3D structure prediction was made. The initial reconstruction of spatial structure uses elements of distance geometry. A refinement procedure was focused on correction of chosen intraglobular distances and removal of stressed contacts by minimization of the corresponding squared deviations (penalty function P ) with respect to CA coordinates. Elements of non-local search were introduced by routinely and systematically inverting the configuration of loop fragments and controlling the change of the penalty function P. The set of structures with comparable values of the penalty function P was obtained; they were used for correction of the distance matrix for the distance geometry algorithm, etc. Several prediction protocols were tested on 2cro - the protein, similar in the size and helicity to NK-lysin. Two of them, providing the set of structures with RMS 3.9 - 4.4A were used for prediction of NK-lysin structure. Three families of feasible structures were obtained.
A POSTER BY :
Lihua Yu, Chrysanthe G. Gaitatzes, Jim V. White and Temple F. Smith
BMERC, College of Engineering, Boston University
email : lyu@darwin.bu.edu
We are predicting the protein structures from primary sequences in the following way:
We use discrete state space models (DSMs) [ref. 1,2,3] to compute the probabilities that the protein belongs to each of four distinct superclasses (alpha, beta, alpha-beta, irregular). If one of the superclasses has a probability greater than 0.8, then (1) we compute probabilities that the protein belongs to each of distinct macroclasses under that superclass (e.g. beta-propeller, four-alpha-helix bundle, alpha-beta-8-barrel...), and (2) we use a smoothing algorithm to predict the secondary structure [ref. 4] according to the macroclass prediction. The smoothing algorithm computes the probabilities that each amino acid is in either a helix, a strand, a turn, or a coil.
The three dimensional structure of the core of the protein is predicted by generalized homologous extension modelling. Since there is no sequence homologue in the Brookhaven database for the given protein, we select a structure analog, based on the predicted structural macroclass and secondary structures. Using QUANTA version 4.1 (copyright MSI Inc.), we mutate the amino acids of the structure analog core into those of the given sequence. We then use CHARMm version 23.1 [ref.5] to refine the three dimensional structure of the core segments by energy minimization and molecular dynamics.
[1] White, J.V., Stultz, C.M., Smith, T.F. (1994) Protein classification by stochastic modelling and optimal filtering of amino-acid sequences. Mathematical Biosciences 119:35-75.
[2] Stultz, C.M., Nambudripad, R., Lathrop, R.H., & White, J.V. Predicting protein structure with probabilistic models. In press.
[3] World wide web URL: http://bmerc-www.bu.edu/psa/
[4] Stultz, C.M., White, J.V., & Smith, T.F. (1993). Structure analysis based on state-space modelling. Protein Science 2:305-314.
[5] Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., Sates, D.J., Swaminathan, S., and Karplus, M. (1993) J. Comput. Chem., 4, 187-217.
A POSTER BY :
Scott M. Le Grand, Daniel Fischer, and David Eisenberg
UCLA-DOE Lab of Structural Biology and Molecular Medicine
email : legrand@tesla.mbi.ucla.edu
An evaluation of the performance of LINUS, 3D1D Profiles, and several potentials from the Levitt lab at the gapless threading problem. We demonstrate that while a good performance is a necessary condition for a good potential function, it is not a sufficient condition. Several of these potentials achieve a score of 100% recognition at the gapless threading problem only to fail at the more rigorous task of ab initio conformational search. This failure is not due to the Levinthal Paradox! Conformational serach is NOT trapped in high enrgy local minima. The failure is instead due to the location of non-native conformations which are rated by these potential functions as superior to that of the native structure. This indicates that the gapless threading problem drastically undersamples the conformation space available to proteins. We conclude that potential functions should be evaluated by ab initio conformational search rather than by gapless threading.
A POSTER BY :
Joel Gillespie and David Shortle
JHU School of Medicine, 725 N. Wolfe St., 513 Physiology, Baltimore, MD 21205
email : [??]
Structural analysis of D131D, a fragment model of the denatured state of staphylococcal nuclease, has been extended by obtaining long range distance restraints between protein chain segments by paramagnetic relaxation enhancement. Fourteen unique PROXYL spin labels were introduced at sites which are solvent exposed in the native state, and the resulting enhancements of T2 for the amide protons were measured by NMR spectroscopy. When these data were combined with either measured or estimated correlation times, the r6-weighted, time-averaged distance between the spin label and each of 30 to 80 amide protons could be calculated for each spin labeled protein. On the basis of approximately 700 such loose distance restraints, an ensemble of structures compatible with these restraints were generated by a combined distance geometry / molecular dynamics approach. Because of the large uncertainty in the physical basis of these distance restraints, a series of trial calculations were carried out to determine the largest range in the assigned distance that would lead to convergence in the calculated structures. To obtain an ensemble average structure, the distance maps of 16 independent structures were averaged, and molecular structures were calculated from this averaged map. Several general features were common to all individually calculated structures and to the best average structure. Although several striking non-native interactions can be unambiguously defined, overall the topology of this denatured state is similar in many respects to that of the native state. This finding suggests that the topology of a proteins fold is established in the denatured state in the absence of both tight packing interactions and fixed hydrogen bonding, suggesting that hydrophobic interactions alone may encode global topology. Consequently, schemes for predicting protein structure based on hydrophobic burial alone may succeed in defining many of the topological features but fail to correctly predicted high resolution detail
A DEMONSTRATION BY :
Motonori Ota and Ken Nishikawa
National Institute of Genetics, Japan
email : mota@genes.nig.ac.jp
We would like to introduce here a threading program. It can deal with database searches in two protocols: In the forward folding protocol, a sequence seeks the most compatible structure in the structural library constructed from the structural database (PDB). In the inverse folding protocol, a structure picks up its homologous sequences from the sequence database (Pir or Swiss-Prot). The former protocol has partly succeeded and it becomes a major tool for the structure prediction, as we have known well, but the latter is known to be difficult. We considered the difficulties were attributed to the illness of the pseudo-energy potential used in the estimation of 3D-1D compatibility. We made simple criterion to assess the accuracy of the function, evaluated several functions and concluded that a function normalized by referring to the random environmental state (denatured state) is more sensitive than the function of ordinary type. This re-normalization procedure is called as the minus average operation (MAO).
Incorporating a function with MAO (our function consists of four terms, side-chain packing, hydration, back-bone hydrogen bonding and local conformation), we could carry out three approaches of protein analysis indicated as follows.
1. Estimation of the structural stability of point mutants. To align structure and sequence, we use so-called 3D-profile. By the employment of MAO (means the introduction of the denatured state), we can regard the profile as an expression of the structural stability of all point mutations in the values of DG (free energy difference). With more arithmetic, we could derive DDG of mutant protein. The calculations (DDG) correlate weakly, but significantly with the experimental data (melting temperature, DT) of ribonuclease HI mutants. We would like to introduce 3our2 3D-profile in 3literal2 form as well as how to estimate the stability with it.
2. Compatible structure search of a query sequence The searches incorporating functions with/without MAO were carried out, but they performed equivalently well for the examples we equipped. In the demonstration, we could show our results and want to run the program (with MAO) against small structural library set (about 100). The computational time for a sequence (200 aa) is about 10 minutes at Indy (SGI). If available, we could challenge the compatible structure search of new sequence for the large structural library.
3. Compatible sequence search of a query structure Contrary to the above case, the performance of the inverse folding search was remarkably improved by the introduction of MAO. We would like to show the results of the inverse folding search from globin, ras p21 and actin structures. Also, we want to demonstrate the search against the small sequence database (a part of Pir section 1, about 1000 entries). The scanning time for a structure (200 aa) is about 10 minutes. If available, we could challenge the compatible sequence search against to the complete set of Pir section 1 or 2 or both.
A DEMONSTRATION BY :
Bernd Kramer, Matthias Rarey, and Thomas Lengauer
German National Research Center for Information Technology (GMD) Institute for Algorithms and Scientific Computing (SCAI) 53754 Sankt Augustin, Germany
email : bernd.kramer@gmd.de
We present the results of applying our docking software FlexX [1,2,3] to eight protein-ligand docking problems from the CASP2 data set. FlexX is based on a method that combines an appropriate model of the physico-chemical properties of the docked molecules with efficient methods for sampling the conformational space of the ligand. If the ligand is flexible, it can adopt a large variety of different conformations. Each such minimum in conformational space presents a potential candidate for the conformation of the ligand in the complexed state. Our docking method samples the conformational space of the ligand on the basis of a discrete model and uses a tree-search technique for placing the ligand into the active site incrementally. For placing the first fragment of the ligand into the active site, we use hashing techniques adapted from computer vision. The remaining fragments are added by an incremental construction algorithm that is based on a greedy strategy combined with efficient methods for overlap detection and for the search of new interactions. The algorithm also supports the docking of covalently bound ligands if the covalent bond is given. The ranking of the generated solutions is obtained by using a scoring function similar to that developed by Boehm [4] which estimates the free binding energy of protein-ligand complexes.
Before FlexX can be used to automatically generate complex structures, we have to perform a few initial preparation steps manually. For creating an energy-minimized structure of each ligand, we use the model builder SYBYL [5]. Only an arbitrary low-energy conformation is required for the input. In addition, the active site residues of the protein are selected by hand. After superimposing a given protein structure with structurally solved complexes of the same protein with other ligands, all residues within a distance of 8 - 10 A to any ligand were used to define the active site.
Finally we manually reduced the standard set of a few hundred solutions generated by FlexX to a set with at most four models. This was done by clustering similar placements and conformations, and eliminating the solutions with low scores. During this manual selection, we have considered also chemical phenomena such as the formation of intramolecular hydrogen bonds that are not modeled yet.
[1] M. Rarey, B. Kramer, and T. Lengauer. In Proc. of the 3rd International Conference on Intelligent Systems in Molecular Biology , pages 300 - 308. AAAI Press, Menlo Park, California, 1995.
[2] M. Rarey, S. Wefing, and T. Lengauer. J. Comput.-Aided Mol. Design , 10: 41 - 54, 1996.
[3] M. Rarey, B. Kramer, T. Lengauer, and G. Klebe. J. Mol. Biol. , 261: 470 - 489, 1996.
[4] H.-J. Boehm. J. Comput.-Aided Mol. Design , 8: 243 - 256, 1994.
[5] TRIPOS Associates, Inc., St. Louis, Missouri, USA. (1994). SYBYL Molecular Modeling Software Version 6.x .
A POSTER BY :
Inna Dubchak, Ilya Muchnik, Sung-Hou Kim
UC Berkeley, CA; Rutgers University. NJ
email : ildubchak@lbl.gov
We present an improved method for predicting protein folding class for any protein sequence without clearly identified function or significant homology to known sequences. The original method [1] is based on global protein chain description and a voting process . Selection of the best descriptors was achieved by a computer simulated neural network trained on a database consisting of 254 proteins assigned to 83 folding classes [2]. Protein chain descriptors include overall composition, transition, and distribution of amino acid attributes such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. The method was enhanced by using this technique to assign proteins to the now-central SCOP classification. Representative set of 611 proteins with less than 35 % pairwise sequence identity was used for training. Using additional amino acid attributes (predicted by PHDsed and PHDacc secondary structure and solvent accessibility , volume, charge, hydrophobicity assigned by different methods, etc.) in the voting allowed to increase a number of classes in the prediction scheme.
[1] Inna Dubchak, Ilya Muchnik, Stephen R. Holbrook (1995). Proc. Natl. Acad. Sci. USA, 92, 8700.
[2] Pascarella, S. and Argos, P. (1992) Prot. Engng., 5, 121 - 137.
A POSTER BY :
Dietlind L. Gerloff, Marcin Joachimiak, Marcel Turcotte, Fred E. Cohen, and Steven A. Benner
Dept. of Pharm. Chemistry, UCSF and Dept. of Chemistry, UFL
email : gerloff@cgl.ucsf.edu
Predictions for six CASP2 ab initio targets were submitted in a collaborative effort to explore the potential for predicting supersecondary and tertiary structure in the transparent secondary structure prediction method developed at the ETH Zurich [1]. The targets were selected based on the availability of homologous protein sequences in adequate numbers and evolutionary distributions in the common data bases: fibrinogen (gamma-chain C-terminal fragment, T0005), polyribonucleotide nucleotidyltransferase (S1 motif, T0004), heat shock protein 90 (N-terminal domain, T0011), ferrochelatase (bacterial, T0020), calponon homology domain (of beta-spectrin, T0037), and NK-lysin (T0042).
Multiple alignments were generated using the automated DARWIN-server (http://cbrg.inf.ethz.ch/) [2] and refined in gap regions using a new automated tool [3]. Secondary structures were predicted based on periodicity in 'SURFACE', 'INTERIOR' and 'ACTIVE SITE' positions assigned using heuristics developed at the ETH [4]. From each target alignment, manual predictions were made independently by different members of the prediction team and, in most cases, by an automated implementation of the method [5] and compared with each other. Plausible supersecondary and tertiary structures were predicted semi-systematically from indications about the relative orientation of secondary structural elements that can be derived from the sequence analysis used for secondary structure prediction, other indications such as minimum connection lengths between secondary structure elements, etc., and functional considerations (putative enzymatic activity, etc.). Some prediction problems (e.g. T0004 and T0042) were suitable for a combinatorial analysis similar to [6] or [7], where empirical rules are used to reduce the number of possible folding topologies. In others (e.g. T0011), we based our suggested tertiary structure on manual, knowledge-based "fold recognition" for parts of the target structure. In most predictions, we used a concept of "refining secondary structure prediction by tertiary structure modeling" as one of the novel components in our approach (in comparison with CASP1). Most importantly, the use of transparent approaches for secondary and supersecondary / tertiary structure prediction allows us to rationalize why prediction mistakes occurred where they did, and to refine our heuristics based on these insights [8].
[1] S. A. Benner & D. Gerloff (1990). Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. Adv. Enzyme Regul. 31:121-181.
[2] G. H. Gonnet, M. A. Cohen & S. A. Benner (1992). Exhaustive matching of the entire protein sequence database. Science 256:1443-1445.
[3] C. Korostensky & G. H. Gonnet, unpublished results.
[4] S. A. Benner, I. Badcoe, M. A. Cohen & D. L. Gerloff (1994). Bona- fide prediction of aspects of protein conformation. J. Mol. Biol. 235:926-958.
[5] M. Turcotte & S. A. Benner, unpublished results.
[6] D. L. Gerloff, G. Chelvanayagam & S. A. Benner (1995). A predicted consensus structure for the protein kinase C2 homology (C2H) domain, the repeating unit of synaptotagmin. Proteins 22:299-310.
[7] F. E. Cohen & I. D. Kuntz (1989). Tertiary Structure Prediction. In: "Prediction of Protein Structure and the Principles of Protein Structure" (G. D. Fasman, ed.), pp. 647-705.
[8] S. A. Benner, G. Chelvanayagam & M. Turcotte (1996). Bona fide predictions of secondary structure using transparent analyses of multiple sequence alignments. Chem. Rev., submitted for publication.
A POSTER BY :
Ilya A. Vakser
The Rockefeller University, NY
email : ilya@guitar.rockefeller.edu
The Global Range Molecular Matching (GRAMM) methodology is an empirical approach to smoothing the intermolecular energy function by changing the range of the atom-atom potentials. The technique allows to locate the area of the global minimum of intermolecular energy for structures of different accuracy. To predict the structure of a complex, it requires only the atomic coordinates of the two molecules (no information about the binding sites is needed). The program performs an exhaustive 6-dimensional search through the relative translations and rotations of the molecules. The molecular pairs may be: two proteins, a protein and a smaller compound, two transmembrane (TM) helices, etc. GRAMM may be used for high-resolution molecules (high-resolution docking), for inaccurate structures, where only the gross structural features are known, and in cases of large conformational changes (low-resolution docking).
The high-resolution docking is designed for accurate complex predictions, in case of small structural inaccuracies (1,2). At high resolution, GRAMM provides a list of high-score (low-energy) ligand positions, which may be taken as is, or may be refined by other techniques. Since GRAMM does not use a statistical sampling, but rather performs an exhaustive search, it yields all configurations of the complex with the high-score steric fit (within the accuracy of the search step and the molecules' representation). Even if high-resolution structures are available, it is still possible to run docking at low resolution, to determine the potential areas of the global minimum.
The low-resolution docking (3,4) is designed for the prediction of the gross features of a complex, in the case of major structural inaccuracies. It may also be used, in the case of accurate structures, to overcome the multiminima problem (5), or to reveal the role of large-scale structural factors in the formation of protein complexes (6). However, the docking results at the low resolution may give only the general preferences (often nonspecific) in the complex formation. For example, in case of protein-ligand docking, it is the distribution of low-energy ligand positions in the proximity of the binding site of the protein. In case of TM helix packing, it could be a two-dimensional sector where a helix is likely to make a complex with any other helix (due to the low-resolution preferences in helix packing, see Ref. 5).
GRAMM version 1.03 is publicly available at http://guitar.rockefeller.edu.
[1] Katchalski-Katzir, E. , Shariv, I., Eisenstein, M., Friesem, A. A., Aflalo, C., Vakser, I. A., 1992, Molecular surface recognition: Determination of geometric fit between proteins and their ligands by correlation techniques, Proc. Natl. Acad. Sci. USA, 89: 2195-2199.
[2] Vakser, I. A., Aflalo, C., 1994, Hydrophobic docking: A proposed enhancement to molecular recognition techniques, Proteins, 20: 320-329.
[3] Vakser, I. A., 1995, Protein docking for low-resolution structures, Protein Eng., 8: 371- 377.
[4] Vakser, I. A., 1996, Low-resolution docking: Prediction of complexes for underdetermined structures, Biopolymers, 39: 455-464.
[5] Vakser, I. A., 1996, Long-distance potentials: An approach to the multiple-minima problem in ligand-receptor interaction, Protein Eng., 9: 37-41.
[6] Vakser, I. A., 1996, Main-chain complementarity in protein - protein recognition, Protein Eng., 9: 741-744.
A POSTER BY :
Derek Debe, Matt J. Carlson and William A. Goddard III
California Institute of Technology
email : mjc@wag.caltech.edu
The prediction of the conformational global minimum for a given primary protein sequence remains a formidable and unsolved task due to the numerous local minima which populate the rugged folding landscape and the extensive overall conformation expanse available to a folding peptide chain. In general, a procedure employing an efficient large scale conformation search coupled with accurate energy evaluation is necessary. To satisfy these requirements, Goddard group researchers employ a hierarchical technique for the prediction of protein tertiary structure from primary sequence. A Boltzmann Factor Biased (BFB) polymeric growth method employing a reduced protein representation has been developed, providing fast generation of compact, off-lattice, protein fold candidates. Using the Probability Grid Monte Carlo (PGMC) Builder, full atom models are constructed from energetically favorable BFB candidate motifs for further refinement with standard Molecular Dynamics methods. The BFB method is a guided polymeric growth procedure. One chain configuration is generated at a time, adding one residue unit at each step to any user specified starting point in the protein chain. Each residue placement determined from an evaluation of the nonbond energies for each phi/psi torsion angle sampled. Typically, 50 torsion angles representing an even Ramachandran distribution are sampled. A particular torsion is selected using Monte Carlo techniques, where the probability of a specific torsion selection is given by the ratio of the individual torsional Boltzmann energy to the summation of the Boltzmann energies for all torsions sampled. This methods leads to fast generation of Boltzmann energy biased structures. The BFB method employs a reduced representation of the protein for growth and relative energy evaluation of the grown configurations. While the method can be used with any reduced atom force field, we currently use a model developed within the Goddard group. In our model, all backbone atoms are preserved, as well as the amide hydrogen. The sidechains are represented by a single Cbeta psuedoatom. The position and behavior of this Cbeta atom is calculated from a group over 100 non-homologous, high resolution Protein Data Bank crystal structures. In addition to the peptide bond torsion, all bond lengths and angles are held fixed. Thus only Van der Waal's terms and phi/psi torsion terms need be included in the conformation energy calculation. Our current parameters are derived from a high quality all-atom force field and are then modified to fit the phi/psi torsional curves of high quality quantum mechanical calculations on small peptides. Other properties, such as hydrogen bonding, are also derived from quantum mechanics, with further adjustments made to comply with the use of a reduced, fixed bond and angle force field. Since the BFB method is a residue addition growth procedure, known structural constraints can easily be incorporated to reduce the overall conformation space which must be sampled. For sections of the sequence here the secondary structure is known or predicted, torsion angles may be fixed, allowing the secondary structure to be added as a single unit in the growth procedure. Other structural constraints, such as disulfide bonds, metal binding centers, and NOESY distance constraint data can be incorporated through algorithms which quickly weight the probability space of torsions for a given constraint. Furthermore, the residue addition technique can be used in conjunction with di- and tri- peptide library information, as well as segment structure information derived from threading techniques.
A DEMONSTRATION BY :
Steven Ness, Trevor Hart, Randy Read
University of Alberta, Edmonton, Canada
email : sness@poirot.mmid.ualberta.ca
We have developed a Graphic User Interface (GUI) to our docking programs RESEARCH and GAMMA. RESEARCH is our multiple-start Monte Carlo docking program that allows ligands to be docked with full conformational flexibility to a rigid protein target. Scoring functions used by RESEARCH include a pairwise, group-based Lennard-Jones energy function, a contact scoring algorithm, and a fast, grid based energy function. RESEARCH was used by our group to predict binding modes of the ligands in the docking section of the CASP2 challenge.
GAMMA utilizes the force fields implemented in RESEARCH, but uses an advanced Genetic Algorithm (GA) as the minimization engine. Advanced features found in our GA engine include higher order gene representations (including real numbers), local sub-populations (demes), adaptive genetic operators, hybrid local improvement procedures and a flexible and open API written in the object-oriented language C++.
The GUI we present allows a user to easily set up and run docking jobs, and provides a common interface to both our docking engines and to their support programs.
A POSTER BY :
Heinz-Theodor Mevissen, Ralf Thiele, Ralf Zimmer and Thomas Lengauer
GMD-SCAI, Schloss Birlinghoven, 53754 Sankt Augustin, P.O. Box 1316, Germany Tel.: +49 2241 14 2302, 2818, 2777
email : Heinz-Theodor.Mevissen|Ralf.Thiele|Ralf.Zimmer|Thomas.Lengauer@gmd.de
We present selected protein structure predictions submitted for the CASP-2 contest in the fold recognition (F/A) category. We have made 16 predictions out of the 22 fold recognition targets with the prediction algorithms implemented in our software tool ToPLign [4]. In particular, we applied three different methods, sequence alignment with newly derived class context substitution matrices, a dynamic programming based algorithm for contact capacity potentials, and a heuristic divide & conquer algorithm to optimize interaction potentials. All three methods align the target sequences against non-redundant representative sets of structures and the whole PDB. By varying important parameters over a set of reasonable settings, these procedures produce several scoring lists, which are ranked according to different criteria. The corresponding alignments are analyzed and reevaluated with various ToPLign tools to come up with a joint 'consistent' prediction of the most probable model structures. Finally, the alignments/threadings with these model structures are refined by tuning alignment parameters using path, energy and reliability profiles as well as parametric optimization algorithms [7] of ToPLign.
For the class context substitution matrix (CSM) based sequence alignment we use standard sequence alignment algorithms with affine gap cost functions to compute global, shift, and local alignments. Class context substitution matrices are substitution matrices for pairs of substrings of amino acids instead of pairs of single amino acids. They are derived from multiple and structural alignments of protein families in order to produce a set of family specific matrices. We estimate statistically whether the alignment score of the target with a respective protein family using the corresponding matrix is significant as compared to the expected score for known members of that family with that matrix. This procedure helps to determine a possible protein family for the target sequence in question and to increase the quality of the alignments in the refinement step.
The 123D method is tailored towards a fast selection of reasonable targets out of a representative fold library. It is a dynamic-programming based, efficient method for optimally aligning protein sequences to protein structures according to empirical environmental/profile scoring potentials. These so-called contact capacity potentials (CCP) [1] are essentially one-body terms, i.e. depend on one amino acid partner of interactions only and measure the preference of amino acids to have a certain number of contacts in certain structural environments. The potentials are supposed to be generalized measures of hydrophobicity, which is assumed to be an essential driving force for folding proteins into their native structure. The potentials are derived from an analysis of a non-redundant database of highly resolved structures by converting relative frequencies into pseudo energies using a normalization according to the inverse Boltzmann law. In a previous evaluation [1] it has been shown that the scoring function is discriminative enough to recognize native sequence-structure relationships and to detect structural folds in the absence of significant sequence similarity.
The recursive dynamic programming (RDP) method [6] is tailored towards the discrimination between alternative plausible structure models and the optimization of the respective alignments. RDP is a heuristic approach for the approximate solution of the full threading problem for a wide range of scoring functions, exploiting sequence conservation, sequence patterns [2], environmental profiles [1], and empirical pairwise interaction potentials of the type introduced by Sippl [5]. Threading problems of this kind are known to be NP-hard [3]. RDP hierarchically assembles locally optimal solutions into partial sequence-structure alignments by focusing on highly conserved regions with highest priority. Conservation can be in terms of local sequence similarities, characteristic sequence patterns, or contact capacity. Then, RDP recursively exploits also interactions derived from already mapped parts (in contrast to the frozen approximation approach) in order to detect weaker signals. RDP is able to produce good fold recognition and, at the same time, biologically reasonable sequence-structure alignments using a modest amount of computing resources.
The poster will present selected fold recognition predictions from our 16 CASP2 submissions in order to discuss advantages and shortcomings of the used methods.
Acknowledgments:
This research has been funded in part by the BMBF funded project PROTAL: "Proteins: Sequence, Structure and Evolution" under grant no. 413-4001-01 IB 301 A/1.
References:
[1] N. Alexandrov, R. Nussinov, and R. Zimmer: "Fast Protein Fold Recognition via Sequence to Structure Alignment and Contact Capacity Potentials", Pacific Symposium on Biocomputing'96, World Scientific Publishing Co., 1996, 53-72.
[2] A. Bairoch and P. Bucher and K. Hofmann: "The PROSITE database, its status in 1995", NAR, 189-196, 1996.
[3] R. Lathrop: "The protein threading problem with sequence amino acid interaction preferences is NP-complete", Protein Engineering, Vol 7, No 9, 1059-1068, 1994.
[4] H. Mevissen, R. Thiele, R. Zimmer, and T. Lengauer: "Analysis of Protein Alignments - The software environment ToPLign", GMD ToPLign WWW interface: http://cartan.gmd.de/ToPLign.html, 1994-96.
[5] M. Sippl: "Calculation of Conformational Ensembles from Potentials of Mean Force: An Approach to the Knowledge-based Prediction of Local Structures in Globular Proteins", JMB, 859-883, 1990.
[6] R. Thiele, R. Zimmer, and T. Lengauer: "Recursive Dynamic Programming for Adaptive Sequence and Structure Alignment", ISMB'95, C. Rawlings et al. (Eds.), AAAI Press, 384-392.
[7] R. Zimmer and T. Lengauer, "Fast and Numerically Stable Parametric Alignment of Biosequences", First Annual International Conference on Computational Molecular Biology (RECOMB'97), M. Waterman (Ed.), ACM Press, to appear 1997.
A POSTER BY :
Russ B. Altman, Mark Gerstein and Robert Schmidt
Stanford University
email : russ.altman@stanford.edu
As the number of proteins in the PDB increases, it becomes possible to group proteins with similar structures together into families, and to statistically summarize the properties of these families. The Library of Protein Family Cores (LPFC) is a WWW resource that contains analysis of protein families from the HOMALDB (Sali & Overington) and the FSSP (Holm & Sander) multiple structural alignment methodologies. For each protein family, it provides the individual family members with links to other related resources (SCOP, SWISSPROT, PDB, PROTEIN MOTIONS DATABASE), as well as the average coordinates for the core residues of each family, the spatial variability of each aligned residue, and a rank ordering of the spatial variabilities allowing the definition of a low variance protein core. The LPFC also contains VRML files that graphically illustrate the variability of each residue using three-dimensional ellipsoid renderings. The LPFC is a useful resource for homology modeling and threading applications that require reliable sets of average coordinates, and estimates of the uncertainty in these coordinates for new family members.
http://www-smi.stanford.edu/projects/helix/LPFC Protein Science, in press.
A POSTER BY :
Liping Wei, Jeffrey Chang and Russ B. Altman
Stanford University
email : russ.altman@stanford.edu
We have developed a method for describing protein three-dimensional sites using a set of biochemical and biophysical descriptors [Bagley & Altman, Protein Science 4, p. 622-635]. We have recently shown that the statistical model produced by our method can be used to "score" a new structure to determine if it contains a site of interest. We have recently applied these techniques to the threading problem. Each core backbone position in a fold can be considered a "site" and a new sequence that has been mounted on a structure can be scored at each amino acid "site" to determine the overall compatibility of the sequence with the fold. We have tested our method on the globins (using cross-validation) and have shown that we are able to distinguish globin-like folds from random, shuffled, and other all-alpha sequences, including colicin and phycocyanin. We are in the process of building models for other protein families for a more exhaustive evaluation of the method. The key details of the method include:
1. We use protein cores taken from multiple alignments, as reported in the Library of Protein Family Cores (http://www-smi.stanford.edu/projects/helix/LPFC).
2. We score only the amino acid environments that are useful for maximizing the distinction between proper and improper folds. For the globins, a core set of 34 amino acids (out of approximately 104 aligned positions) gives optimal separation of globins from non-globins.
3. We generate alignments by allowing no gaps in core elements, and generating statistical models of gap lengths using training data sequence alignments, in order to generate plausible gap models. We are experimenting with ways to add stochastic element to these models to try to capture large inserts, never before observed in training data.
4. Our scoring function uses multiple types of physico-chemical properties at levels of detail ranging from atomic to residue and secondary structural, and implicitly models multiple orders of interaction between amino acids. The scoring function can be compiled into lookup tables and precached for each fold model, for rapid evaluation of plausible alignments.
A POSTER BY :
Peter J. Munson and Raj K. Singh
DCRT, NIH and University of North Carolina
email : munson@helix.nih.gov
Using a simplified representation (one particle per residue), we have constructed a contact potential which uses up to 4-body interactions. Such high-order interactions may arise due to side-chain packing requirements of the protein core or to details of charged pair interactions. As suggested in (Singh, Tropsha et al. 1996), we use a filtered Delaunay tessellation of the C-alpha carbon centers, to determine which 4-tuples of residues are considered to interact. The potential is estimated as the coefficients of a log-linear statistical model describing the propensities of 4 residues to occur together in a database of 608 nonredundant proteins with less than 35% mutual pairwise identity (Hobohm and Sander 1994). Statistical tests showed the three-body interactions to be strongly significant, with the 4-body interactions to be important at a lower level of statistical significance. The potential was also tested on set of 12 same-length pairs of proteins of known structure (as described in (Holm and Sander 1992). The sensitivity of this potential was evaluated with regard to the following two protocols: (1) Sequence- recognizes-Structure where a non-gapped threading of a query sequence is threaded through the native and non-native structure; (2) Structure-recognizes- Sequence where a query structure is threaded by its native and a nonnative sequence. Of the 24 possible comparisons for the testset, sequence recognized the native structure in all 24 cases. Conversely, structure recognized the native sequence in 23 of 24 cases. The degree by which the correct structure was chosen was generally proportional to the sequence length, and increased as higher order terms were added to the model, suggesting again that terms beyond pairwise are important to protein structure/sequence testing.
A POSTER BY :
Azat Ya. Badretdinov , Michael Sternberg , Cyrus Chothia and Alexei V. Finkelstein
Institute of Protein Research, Pushchino, Russia and Rockefeller University, New York, USA
email : azat@guitar.rockefeller.edu
Can a visible symmetry of two interacting alpha helices reach an atomic level responsible for close packing and rigidity of protein? We show that two-fold symmetry axis in close packing of an alpha-helical pair in proteins is incompatible with the requirement of the "ridge-into-groove" type of interhelical interaction. Thus, packing of two helices could be implemented in at least two different packing patterns. Two consequences follow. In the case of relevant protein architectures the sequences that are supposed to code even a symmetric fold could be enforced to break their internal symmetry when it seems to be reasonable at first glance. When representative protein domains are to be selected for subsequent incorporation into threading database, not only the protein fold space should be exhaustively explored, but the space of all the possible packing patterns should be searched as well.
A POSTER BY:
Francisco Melo, Cecilia Riquelme and Ernest Feytmans
Facultes Universitaires Notre Dame de la Paix, Department of Biology, Laboratory of Molecular Structural Biology, 61 Rue de Bruxelles, 5000 Namur, Belgium.
e-mail: fmelo@biq.fundp.ac.be
We present a very sensitive and accurate mean force potential (MFP) at atomic level able to identify high energy zones (HEZs) in protein structures. These HEZs correlate very accurately with punctual errors or misalignments in protein models that have been refined using different methods that failed to detect these errors. In these refined models, the punctual errors are frequently found in loops or zones of structural difference between the template and the native protein.
The MFP has been used to calculate only longe range interactions. We have defined 40 different atom types within the 167 heavy atoms of the 20 amino acids, obtaining 820 (40 x 41 / 2) pairs of atoms and their respective energy distributions. The definition of each atom depends on its bond connectivity, chemical nature and location level (side chain or backbone) [1]. The long range environment of one atom is defined as all the heavy atoms within an euclidean distance of 7 Å, that belong to an amino acid that is farther than 11 residues in the chain or that belong to another chain.
We computed three-dimensional energy profiles, evaluating the long range environment of each atom, in many models submitted to Asilomar I experiment in the category of comparative modeling. Our results show that these energy profiles are able to detect: 1) misalignments, 2) errors in loop regions, and 3) errors in zones of structural variability between the template and the target protein.
The poster describes all the models analyzed and the results obtained. The results were also compared with the structural analyses made by some groups after the first Asilomar meeting [2].
[1] Francisco Melo and Ernest Feytmans. (submmited to J. Mol. Biol)
[2] Proteins: Structure, Function, and Genetics 23, 1995.
A POSTER AND SOFTWARE DEMONSTRATION BY:
Francisco Melo, Cecilia Riquelme and Ernest Feytmans
Facultes Universitaires Notre Dame de la Paix, Department of Biology, Laboratory of Molecular Structural Biology, 61 Rue de Bruxelles, 5000 Namur, Belgium.
e-mail: fmelo@biq.fundp.ac.be
We present a software that computes three-dimensional long-range energy profiles in proteins, detecting high energy zones (HEZs) in these structures. These HEZs correlate very accurately with punctual errors or misalignments in protein models. In addition to this, the existence of these HEZs in experimentally solved protein structures can possibly represent: 1) punctual errors like violation of van der Waals radii limit, missing atoms or structural gaps in the structure, exposed hydrophobic residues or presence of heteroatom groups or chemical ligands, near or covalentely bound to these HEZs, 2) interacting zones between subunits in multimeric proteins, or 3) interacting zones in momomeric proteins, i.e. when these proteins form a multiproteic complex.
In this poster we present a full description of this software and some examples of its utility in molecular biology [1].
We have developed a www server that can be used to obtain these HEZs, by calculating a three-dimensional long-range energy profile of a protein structure containing one or more chains. This server is currently available on Internet at: http://www.fundp.ac.be/sciences/biologie/bms/fmelo/3D-Profile-Maker.html.
[1] Francisco Melo and Ernest Feytmans. (submmited to Science)
A POSTER BY:
Jean-Pierre Kocher, Martine Prevost, Shoshana Wodak and Byungkook
Name of the Institution: National Institutes of Health and Universite de Livre de Bruxelles
e-mail: bkl@helix.nih.gov
Recent data on the stability changes of globular protein molecules upon mutation of residues buried inside the protein indicate that internal cavities exert a strong influence on their stability. Here we calculate the free energy cost of forming sub-atomic and atomic size cavities inside the protein, using a method developed for such calculations in liquids. This involves computing the likelihood of finding spherical cavities of a given size in a dynamically equilibrated ensemble of the molecule, or of the collection of molecules in the case of the liquids.
The free energy values for forming very samll cavities are directly related to the volume packing density, or the fraction of space physically occupied by the atoms in the system. The free energies for forming larger, atomic size cavities depend on the manner in which the unoccupied empty space is distributed through the medium, a property hitherto analyzed only in liquids, but not in the heterogeneous protein matrix.
The results of the calculations on water, hexane, barnase and T4 phage lysozyme lead to the following conclusions: (1) The free energy of cavity formation is higher in proteins than in either water or hexane. This provides a direct evidence that the native protein medium differs in fundamental ways from either of these two liquids. (2) The packing density is higher in the non-polar region of the protein than in the polar one on average. Yet, the cost of forming atomic size cavities is lower in the non-polar than in the polar region. (3) These atomic size cavities tend to occur in highly localized regions of the molecule, primarily in the hydrophobic cores where buried hydrophobic side-chains, belonging to well-developed secondary structures, meet one another. These locations also appear to be more compressible than other parts of the core or surface.
A DEMONSTRATION BY :
Rajgopal Srinivasan and George D. Rose
Johns Hopkins Univeristy, School of Medicine, Department of Biophysics and Biophysical Chemistry, 725 N. Wolfe Street, Baltimore, MD 21205
email : rose@grserv.med.jhu.edu
LINUS is a hierarchic procedure to predict the fold of a protein from its amino acid sequence alone. The name is an acronym for Local Independently Nucleated Units of Structure. The algorithm, which has been implemented in a computer program, ascends the folding hierarchy in discrete stages, with concomitant accretion of structure at each step. The chain is represented by simplified geometry and folds under the influence of a primitive energy function. The only accurately described energetic quantity in this work is hard sphere repulsion -- the principal force involved in organizing protein conformation.
As part of the CASP2 experiment, predictions for five targets were submitted: T0004, T0011, T0026, T0030 and T0037. We will use these targets to demonstrate the success and shortcomings of LINUS as a general tool for protein structure prediction. Possible improvements to the procedure will also be discussed.
A POSTER BY :
Ram Samudrala and John Moult
CARB, 9600 Gudelsky Drive, Rockville, MD 20850
email : ram@iris3.carb.nist.gov
The interconnected nature of interactions in protein structures appears to be the major hurdle preventing the construction of accurate homology models. We present an algorithm that uses graph theory to handle this problem. Each possible conformation of a residue in an amino acid structure is represented using the notion of a node in a graph. Each node is given a weight based on the degree of the interaction between its side chain atoms and the local main chain atoms. The weight is computed using a all-atom conditional probability discriminatory function. Edges are then drawn between pairs of residues/nodes that are consistent with each other (i.e., clash-free and satisfying geometrical constraints). The edges are also weighted according to the probability of the interaction between atoms in the two residues. Once the entire graph is constructed, all the maximal sets of completely connected nodes (cliques) are found using a clique-finding algorithm. The cliques with the best probabilities represent the optimal combinations of mixing and matching between the various possibilities, taking the respective environments into account.
The poster will describe in detail how the above approach works in a comparative-modelling scenario and assess the predictive power of this method by applying it to properly controlled blind tests.
A POSTER BY :
Serge Batalov and Ruben A. Abagyan
The Skirball Institute of Biomolecular Medicine, NYU
email : batalov@mercury.med.nyu.edu
Sequence comparison remains a powerful tool to identify structural relatedness of two proteins. To improve reliability of recognition at the "twilight zone" of sequence identities between 15 and 35%, we analyzed alignments of sequences of protein domains with known 3D folds and derived a set of functions representing structural significance and positional accuracy of a sequence alignment. The subset of 1,347,037 alignments between sequences of structurally unrelated domains was used to derived accurate probability functions of a structurally insignificant alignment at a given sequence identity and an alignment score. It is shown that sequence identity and sequence similarity measure are poor indicators of structural relatedness in the twilight zone, while the alignment score allows to separate alignments of structurally related sequences from the structurally unrelated sequence much better, the expected recognition error being three times lower. The alignment score and the derived probability functions can be used for fold recognition through a multilink chain of significant sequence alignments starting from a query sequence and ending at the sequence of a protein with known fold.
A DEMONSTRATION BY :
Domingues F., Jaritz M., Floeckner Hannes and Sippl M.
Center of Applied Molecular Engineering, Jakob-Haringerstr. 1, A-5020 Salzburg, AUSTRIA
email : domingues@came.sbg.ac.at
We analised the quality of predictions made by ProFIT (a fold recognition tool) using several target sequences of know structure. The ProFIT user did not know in advance the structure of the targets and was asked to predict a fold similar to the native one out of a list of 1300 structures. The list used was a subset of PDB representative of the different known folds types. In each case the target structure was excluded from this list. Also, the sequence homology between the target and any of these 1300 representative folds was not higher than 25% as tested by FASTA.
We present results for four representative cases. Three are successful predictions (beta spectrin, a phosphatase, a nucleoside phosphoryase), the fourth is an unsuccessful case (NFKB P50). The quality of the predictions and of the sequence-structure alignments were assayed by structure-structure comparison using ProSup. More than testing ProFIT accuracy, we discuss the results focusing on possible strategies for for fold recognition.
A POSTER BY :
Floeckner Hannes(1), Jurgen Bajorath(2) and Manfred Sippl(1)
(1)Center of Applied Molecular Engineering, Jakob-Haringerstr. 1, A-5020 Salzburg, AUSTRIA; (2) Brystol-Myers Squibb Pharmaceutical Research Institute 3005 First Avenue, Seattle WA 98121, USA
email : hannes@trixi.came.sbg.ac.at
CD44 is a widely distributed polymorphic type I transmembrane glycoprotein. It has been implicated in a variety of biological functions, many of which are related to its interaction with hyalurin. CD44 has also been shown to bind to the cytokine osteopontin which is an inducer of cellular chemotaxis.
A striking feature of CD44 is its heterogenous expression. The CD44 gene consists of 19 exons, 12 of which can be alternatively spliced. Following exon e5, 10 additional exons can be inserted in many combinations. Mutation of residues in exon e5 were found to affect hyalurin binding suggesting that CD44 e5 plays a critical role in cell comunication. Understanding the 3D structure of the CD44 e5 variant and the strutural requirements of ligand attachement would help in the design of proteins to trigger biological responses to cytokines.
Fold recognition (ProFIT 2.0) was used to combine the sequence of CD44 e5 with 1403 protein structures of the Brookhaven databank. Human cytokines and interleukins (f.e. 1hum-A, 3il8, were found being most probable models for CD44 e5. The best fold candidates were selected and subject for further analysis. ProSup 2.0 was applied for detailed analysis of structural similarities amongst these candidates. Finally a full atom model for Cd44 e5 could be built. This final model was evaluated using PROSA II.
To increase reliability of the predicted model the same procedure was applied on several homologue sequences of CD44 e5. The results for these sequences agree very well with those obtained for human CD44 e5. From the data obtained we propose that the sequence of human CD44 e5 adopts a fold similar to human interleukin 3il8.
ProStar - A test site for protein potentials
A POSTER AND A DEMONSTRATION BY :
Michael Braxenthaler, Ram Samudrala, Jan T. Pedersen and John Moult
CARB/UMBI, 9600 Gudelsky