Maximum Likelihood Periodic Quadratic-Logistic Profile Predictions
We have submitted secondary structure predictions for the proteins: ipns, pbdg, ppdk, prosub, l14, staufen3, and mystery. Our methodology is called a maximum likelihood quadratic logistic (QL) discrimination model based on profiles [1,2]. Briefly, we have calibrated a logistic model for a three state prediction using the maximum likelihood principle assuming that secondary structural state obeys an independent trinomial probability model. The logistic model includes linear or "main-effect" terms for every amino acid residue within a 17 residue window of the state to be predicted, together with certain quadratic or "pair-wise" effects. Namely, we assume a 3.6 residue period for the helix component, and a 2.0 residue period for strand, and multiply the residue pair-wise term by cos(2*pi*k/3.6) or cos(2*pi*k/2.0). The 2*20*20=800 residue pair preference parameters are estimated along with the main-effect terms, using a penalized maximum-likelihood technique. Crossvalidated prediction rates for this method are seen to be 62.5% using on single sequences.The profile method begins with a set of aligned homologous sequences, and rather than representing the sequence elements by a 20 vector of zeros and ones (dummy variables), uses the proportions of each residue seen at an aligned position, giving a 20 vector of proportions. For quadratic terms, we replace the 400-vector of dummy variables representing the residue pairs observed within a window with the corresponding 400 vector of proportions. Alignments are done by first choosing homologues from Swiss-Prot or PIR with greater than 20% homology on stretches longer than 80 residues, and using either pairwise or multiple alignment (CLUSTAL, PILE-UP) programs to determine alignments. Alignments were reviewed manually to remove obvious spurious homologues or spurious portions of the alignment. Areas with gaps in the final alignment were arbitrarily assigned a high coil probability. The expected percent correct figure for Q-L using profiles is 67% to 69%, in two separate crossvalidated tests.
[1] Munson, P. J., V. Di Francesco and R. Porrelli. Protein Secondary
Structure Prediction using Periodic-Quadratic-Logistic Models:
Theoretical and Practical Issues. 27th Annual Hawaii International
Conference on System Science. 5: 375-384, 1994.
and updated in:
[2] Di Francesco, V., P.J. Munson, J. Garnier. Use of Multiple
Alignments in Protein Secondary Structure Prediction. 28th Hawaii
International Conference on System Sciences. (accepted), 1995.
Asilomar Conference home page