Description of the Method
Threading runs were performed using the contact potential and alignment model described in (1). Segments in the trial sequence are aligned with explicitly-defined core elements in the folding motif, that is, with individual strands or helices. No gaps are allowed within core elements, but the lengths of intervening loops are free to vary within specified limits. Scoring is based on the sum of contact potentials for a given alignment, less the sum expected for random sequences with the same composition. No gap penalties are employed, and the threading score is based only on this sum of contact potentials. Note that this alignment model requires that all residue sites within a core motif be aligned with a residue from the threaded sequence. We thus search for a complete core structure or that of a pre-defined structural domain somewhere within the threaded sequence, but we do not attempt to identify matches with arbitrary structural fragments. We do, however, search for smaller structural domains within long sequences, by attempting to identify matches with all core motifs which have fewer residue sites than a sequence has residues. In this search the lengths of sequence segments C- or N-terminal to a core motif are neither constrained nor considered in scoring.For the Asilomar contest we use two new techniques not described in (1). We first of all use a fast heuristic algorithm to search for favorable alignments,a Monte Carlo procedure based on the Gibbs Sampling algorithm. Subsequence blocks are initially aligned at random with core elements. We then sample allowed alignments of each element in turn, in the field defined by the others, and choose new alignments based on the Boltzmann probabilities of the alternative model structures generated in this fashion. This procedure is iterated, and in control experiments may be shown to converge to ensembles containing the correct native alignment. In the Gibbs Sampling algorithm we also allow the precise endpoints of core elements within the known structure to vary within, specified limits. Their values are again chosen stochastically, based on the Boltzmann probability of the model structure generated by extension or contraction of a core element. A second new technique is search of an explicitly defined database of core motifs. Large helices and beta strands in structures from the Protein Data Bank are identified by alpha-carbon distance templates, and trimmed from their ends until only a certain fraction of contacts with other core elements remain. The minimal core motif defined in this way is intended to mimic a substructure one can expect to be conserved in any protein with the same "fold". In addition to minimal cores for complete structures in the Protein Data Bank, we define core motifs for large, chain-continuous globular domains, identified on the basis of chain breaks which produce a high ratio of intra- to inter-domain contacts. In threading we must align some segment from a contest sequence with a minimal core motif, but core elements may also be extended, via the Gibbs algorithm, to include more residues sites from the known structure. The precise boundaries of the core elements in a model structure are thus given as part of the threading results.
(1) Bryant, S.H., Lawrence, C.E., An Empirical Energy Function for
Threading Protein Sequence through Folding Motif, Proteins, 16:92-112
(1993).
Asilomar Conference home page