S. G. Galaktionov and G. R. Marshall
Center for Molecular Design, Washington University, St. Louis, MO 63130

CALCULATION OF TERTIARY STRUCTURE OF SUBTILISIN BPN' PROPEPTIDE AND DOMAIN 3 OF THE PRODUCT OF DROSOPHILA GENE STAUFEN BASED ON PREDICTION OF THEIR INTRAGLOBULAR CONTACT MATRICES.

We used a modernized version of the approach previously described [1]. The approach use four main procedures: prediction of the secondary structure; prediction of the coordination number vector (i.e., the number of contacts for each residue); prediction of the contact matrix; reconstruction of the spatial structure (C-alpha's coordinates) on the basis of contact matrix. These procedures can be iterated in cyclic schemes to refine/edit concertedly the coordination number vector, the location of the elements of secondary structure, contact matrix and tertiary structure. The structures of both subtilisin BPN' propeptide (prosub) and domain 3 of the product of drosophila gene staufen (staufen) were predicted using essentially the same cyclic protocol.

Three known algorithms based on Bayes statistics, information theory and neural networks for SECONDARY STRUCTURE PREDICTION were used as implemented in SYBYL 5.5. The starting location of secondary structure was assumed at regions of complete consensus for all three predictions for prosub (helices, 18-24, 45-58, 67-74; beta-strand, 9-12) and 80% consensus for staufen (helices, 14-25, 56-70; beta-strand, 39-57).

The algorithm previously described [2] was used for PREDICTION OF COORDINATION NUMBER VECTORS starting with amino acid sequences and the data on the location of the secondary structure.

The core part of the procedure, the PREDICTION OF THE CONTACT MATRIX, includes routines for calculation of the starting matrix, bringing it into conformity with the set of stiff conditions (like symmetry) and satisfying the specific optimization criteria regarding the coordination number vector, the contact matrix, its powers and its eigensystems. The refinement routines are organized as an iterative procedure removing the "weakest" contacts (using a given "goodness" matrix) and filling out subsequent vacancies in rows/columns at the most preferred positions. There are many possible way to combine single operations and optimization matrices in this procedure. The protocol we applied for the first calculation of the contact matrix for both proteins was that providing the reconstruction of the spatial structure of 3FLX, a de novo designed protein, with the accuracy of 2.8 A. 3FLX is of the same size as prosub and staufen (79, 77 and 79 residues, respectively) and, like both predicted proteins, significantly helical.

For RECONSTRUCTION OF SPATIAL STRUCTURE on the basis of the contact matrix, we used our procedure [1]. Its parameters were tuned so that the mean intraglobular distance and its variance were close to the values characteristic for proteins of corresponding size (about 15.5 A and 5.5 A, respectively).

The first structure obtained was used for re-editing of the contact matrix (e.g., by removing the stressed contacts, etc.) and refinement of prediction of both secondary structure and coordination number vector, then followed a repeated structure reconstruction, and so on until near-convergence. In the resulting structures, most regions of regular secondary are significantly extended as compared with initially predicted locations.

1. Galaktionov, S. G. and G. R. Marshall. Proc. 27th Hawaiian International Conference on Systems Sciences, Biotechnology Computing, IEEE Computer Society Press. V:326-335 (1994).
2. Rodionov, M.A., Galaktionov, S.G. Mol. Biol., 26:777-783 (1992)

Asilomar Conference home page