Yo Matsuo and Ken Nishikawa
Protein Engineering Research Institute, 6-2-3 Furuedai, Suita, Osaka 565, Japan, matsuo@peri.co.jp, nishikawa@peri.co.jp

A Method for Evaluating Protein Sequence-Structure Compatibility

Four evaluation functions, side-chain packing (Fsp), solvation (Fsolv), hydrogen-bonding (Fhb), and local structure (Floc) functions, were used to evaluate the compatibility of an amino acid sequence with a structure [1-4]. They have the following general form:

Fx = -log (fx(a;s)/fx(s)), x = {sp, solv, hb, loc},

where a denotes the type of amino acid residue (for Fsolv and Floc) or residue pair (for Fsp and Fhb); s, the state of a (spatial relationship between residues for Fsp, solvent-accessibility for Fsolv, hydrogen-bonded or not for Fhb, and local structure for Floc); fx(a;s), the frequency of a in the state s; and fx(s), the frequency of any residue or residue pair in the state s.

The side-chain packing function (Fsp) indicates the propensity of a residue pair (a,b) to be in contact in a particular spatial relationship. The spatial relationship between two residues was defined by the distance between their Cb atoms and the angle between the residues. The angle between residues i and j was defined as the sum of the angles Cb(i)-Ca(i)-Cb(j) and Cb(j)-Ca(j)-Cb(i).

For the other three functions (Fsolv, Fhb, and Floc), the state of a residue or residue pair was difined as follows. Solvent accessibility of a residue was defined by the number of main-chain and Cb atoms within a shell between 8A and 12A from the Cb atom of the residue. Hydrogen bonds were defined by the DSSP algorithm [5]. Local structures (backbone conformations) of residues were classified into five classes: a-helical, right-handed helical, b-strand, extended, and left-handed helical.

A sequence was threaded onto a structure using the 3D-profile method [6,7]. The 3D-profile of a structure was constructed as in [7] using the above functions. A sequence was aligned with the 3D-profile using the Needleman-Wunsch algorithm [8]. A sequence was mounted on a structure according to the alignment thus derived. For a sequence mounted on a structure, scores Sx (x= { sp,solv,hb,loc }) were given by summing up the values of Fx over all residues or residue pairs. The Sx scores were then added up to give Stot, which measured the compatibility of the sequence with the structure. A more negative score indicates better compatibility.

A sequence was compared with a library of known structures. In the present work, 325 structures were taken from PDB. They have less than 30% sequence identity with one another. For the individual structures, compatibility scores Stot were calculated. The scores were expressed in units of standard deviations from the mean (see [3] for details).

[1] Nishikawa, K. and Matsuo, Y. (1993) Protein Eng., 6, 811-820.
[2] Matsuo, Y. and Nishikawa, K. (1994) FEBS Lett., 345, 23-26.
[3] Matsuo, Y. and Nishikawa, K. (1994) Protein Sci., in press.
[4] Amano, T., Yoshida, M., Matsuo,Y. and Nishikawa, K. (1994), FEBS Lett.,351,1-5.
[5] Kabsch, W. and Sander, C. (1983) Biopolymers, 22, 2577-2637.
[6] Bowie, J.U., Luthy, R. and Eisenberg, D. (1991), Science, 253, 164-170.
[7] Wilmanns, M. and Eisenberg, D. (1993), Proc.Natl.Acad.Sci.USA, 90, 1379-1383.
[8] Needleman, S.B. and Wunsch, C.D. (1970) J.Mol.Biol., 48, 443-453.

List of Structures Predicted:

a         b                                            c
--------------------------------------------------------------
L14       L14 (prokaryotic ribosomal protein)          1GCR
bhted     beta-hydroxydecanoyl thiol ester dehydrase   1OFV
bphc      biphenyl-2,3-diol 1,2-dioyxgenase            2LBP
ce-1      Chymotrypsin/Elastase Inhibtor-1             1TBPA
chmut     N-terminal of P-protein (Chorismate Mutase)  2REB
kau       Urease from Klebsiella aerogenes subunit A   8ACN
                                           subunit B   2MCM
                                           subunit G   2ER7E
mystery   A mystery sequence                           2GBP
pbdg      6-Phospho-beta-D-galactosidase               3LADA
ppdk      pyruvate phosphate dikinase (PPDK) domain 1  2MNR
                                             domain 2  1IPD
                                             domain 3  2AAIB
                                             domain 4  1PII
prosub    propeptide from subtilisin BPN'              8DFR
rtp       Replication Terminator Protein               1ABK
smanucecs extracellular endonuclease                   2ER7E
staufen3  Domain 3 of Staufen                          5TIMA
synapto   First C2 domain of synaptotagmin             1ATND
--------------------------------------------------------------
a. Abbreviation of the name of the protein
b. Name of the protein whose structure was predicted
c. PDB code of the structure which showed the best compatibility score.
1GCR, Gamma-II crystallin; 1OFV, flavodoxin;
2LBP, leucine-binding protein;
1TBPA, C-terminal 179 amino acids of TATA-binding protein;
2REB, RecA protein; 8ACN, aconitase; 2MCM, macromomycin;
2ER7E, endothiapepsin; 2GBP, galactose/glucose binding protein;
3LADA, dihydrolipoamide dehydrogenase; 2MNR, mandelate racemase;
1IPD, 3-isopropylmalate dehydrogenase; 2AAIB, ricin;
1PII, N-(5'phosphoribosyl)anthranilate isomerase:indol-3-glycerol-phosphate synthase;
8DFR, dihydrofolate reductase; 1ABK, endonuclease III;
5TIMA, triosephosphate isomerase; 1ATND, deoxyribonuclease I.


Asilomar Conference home page