Krzysztof Fidelis
Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, CA 94551
Comparative modeling of histidine-containing phosphocarrier (HPr) protein from Mycolasma capricolum (McHPr), the Eosinophil Derived Neurotoxin (edn), and the mouse cellular retinoic acid-binding protein (crabpI)
A method for homologous modeling of protein structures using a mixture of commercial, public-domain and "in-house" algorithms is presented. Models were evaluated using a battery of analytical methods. The following describes the steps in the modeling process: FASTA, OWL, ENTREZ, PROSRCH, SCOP, and PHD databases /programs were used to obtain sequences and structures that were related to the target protein. A multiple sequence alignment was generated with the AMPS package (developed by Geoffrey Barton) using the sequences obtained from the various databases. The alpha-carbon (C-alpha) root mean square deviations (RMSDs) were computed between the structure with highest identity to the target sequence and to the other known structures, using the Align program (developed by G.H. Cohen). A structural alignment was also computed using the G program (developed by Jan Pedersen) to identify discrepancies in the AMPS alignments.A minimum perturbation (MP) technique (which preserves the backbone phi/psi and equivalent sidechain chi angles) was used to generate initial coordinates for the residues in the target protein, using the information contained in the structure with highest sequence identity. Mainchain stretches that were unfavorable based on the structural analysis described below were improved by borrowing structural fragments from other homologues.
Insertions and deletions were built using a number of different ab initio loop building and database search methods. Short loops (5-7 residues) were built using the algorithm of Moult & James and Congen (developed by R.Bruccoleri). Longer loops were built using a database search method with constraints derived from the framework of the parent structure.
In addition, for each position in the structure, different side-chain rotamers were generated using Insight (Biosym Technologies), Quanta (MSI) packages, and the Self Consistent Domain (SCD) method (developed by Krzysztof Fidelis and John Moult). Insight uses a MP method as described above but with a slightly different equivalence table. Quanta attempts to search conformational space available to the individual sidechain by spinning each of the chi angles, and evaluating the energy for each rotamer and selecting the lowest energy conformation. The SCD method is a locally developed systematic search algorithm that explores - within a residue based domain - all possible sets of sidechain rotamers.
The rotamers were then evaluated on the basis of their environment, electrostatic interactions, and hydrophobic burial. The rules used in the environment evaluation included packing (whether there was too much or too little space left after any change), favorable and unfavorable electrostatic interactions of sidechains and mainchain, van der Waals clashes, and burial or exposure of a residue. Electrostatic interactions were evaluated using the local Eneana program, and the goodness of the burial of a residue was evaluated using a conditional probability formalism (developed by R.Samudrala & J.Moult).
The set of rotamers and loops that were most favorable with respect to the above criteria were used to compose a final model. The final model was refined using careful energy minimization, during which electrostatic interactions were ignored. The minimizations were performed with the Discover (Biosym Technologies) and CHARMM (MSI) potentials.