
PFRMAT DR
TARGET T0105
AUTHOR 1438-8319-9551
REMARK Predictions are for lack of fixed tertiary structure,
REMARK that is, for "natively unfolded" or "intrinsically
REMARK unstructured" segments.
REMARK O=ordered, structured, folded
REMARK D=disordered, unstructured, unfolded
REMARK Last column is NOT probablity per se.  PONDR is a neural
REMARK network predictor which produces a score.  The cutoff
REMARK for Disorder is indicated by a score greater than 0.5.
REMARK Studies have shown that neural network scores are roughly
REMARK equivalent to probabilities. Short strings of amino acids
REMARK predicted to be disordered occur more often due to chance
REMARK than long strings of predicted disorder.
METHOD
METHOD PONDR is a neural network Predictor of Natural Protein
METHOD Disordered Regions.  The predictor used here is an integration
METHOD of three predictors, one for each termini and one for internal
METHOD sequences (Romero, Obradovic, Li, Garner, Brown and Dunker, in
METHOD press, Proteins: Structure, Function, Genetics).  For the
METHOD internal sequences, a training set of 15 disordered regions
METHOD having a total of 1149 residues was compiled and balanced by an
METHOD equal number of ordered residues taken randomly from NRL_3D. Of
METHOD the 15 disordered regions in the training set, 8 were
METHOD characterized by X-ray diffraction (PDB IDs: 2tbv, 2ts1, 1aui,
METHOD 1bgw, 1elo, 1af3, 1ati and 1lbh) and 7 by NMR (SW IDs: prio_mouse,
METHOD h5_chick, flgm_salty, regn_lambd, hsf_klula, and hmgi_human, and
METHOD PIR accession: S50866).
METHOD
METHOD From an initial pool of 31 attributes, a branch and bound search
METHOD was used to select 10 attributes that gave the best collective
METHOD discrimination between the order and disorder in the training
METHOD set using a Mahalanobis distance criterion. The 31 attributes in
METHOD the initial pool included the 20 amino acid compositions, two
METHOD different hydropathy scales, flexibility index, alpha-moment,
METHOD beta-moment, net charge (K + R - D - E), aromatic composition
METHOD (W + F + Y), coordination number, codon number, alphabet size,
METHOD and side chain volumes. The attributes selected by this process
METHOD were fraction of W, Y, F, D, E, K, R, aromatic composition,
METHOD coordination number, and net charge.
METHOD
METHOD The back-propagation learning algorithm was used to train a
METHOD feedforward neural network having the ten selected attributes as
METHOD inputs, a fully connected hidden layer of ten neurons and a
METHOD single output. To estimate errors, the training was repeated on
METHOD 5 disjoint subsets each having 80% of the data with 3 different
METHOD initializations, so neural network training was repeated 5 x 3 =
METHOD 15 times. Once the accuracy was established by this 5-cross
METHOD validation procedure, a new neural network was trained to the
METHOD same accuracy using all the data.
METHOD
METHOD To enable prediction from the first to the last residue in a
METHOD protein, disorder was partitioned according to position, with
METHOD the development of different predictors for N-terminal, and
METHOD C-terminal regions (Li X, Romero P, Rani M, Dunker AK, Obradovic
METHOD Z. Predicting protein disorder for N-, C-, and internal regions.
METHOD Genome Informatics 1999;10:30-40).  These predictors used 8
METHOD inputs.
METHOD
METHOD The integration of the three predictors was carried out in 3
METHOD steps. First, predictions were made by the three predictors over
METHOD their respective domains, with overlapping predictions for
METHOD positions 11 - 14 by the N-terminal and internal predictors, and,
METHOD for a protein of length M, with overlapping predictions from M-14
METHOD to M-11 by the C-terminal and internal predictors. Second, the
METHOD values for each of the 4 pairs of overlapping prediction were
METHOD averaged.  Third, the now integrated prediction outputs were
METHOD smoothed by averaging over sliding windows of 9 amino acids,
METHOD with the first and last 4 sequence positions being assigned the
METHOD unsmoothed prediction output values from the N- and C-terminal
METHOD predictors, respectively. This integrated predictor is used
METHOD herein.
REMARK The C-terminus may be disordered.
MODEL  1
D       O       0.004
E       O       0.025
N       O       0.161
I       O       0.288
N       O       0.297
F       O       0.338
K       O       0.386
Q       O       0.412
S       O       0.404
E       O       0.379
L       O       0.337
P       O       0.274
V       O       0.245
T       O       0.206
C       O       0.171
G       O       0.126
E       O       0.083
V       O       0.061
K       O       0.036
G       O       0.041
T       O       0.039
L       O       0.037
Y       O       0.036
K       O       0.047
E       O       0.047
R       O       0.050
F       O       0.069
K       O       0.140
Q       O       0.207
G       O       0.268
T       O       0.288
S       O       0.311
K       O       0.321
K       O       0.378
C       O       0.376
I       O       0.362
Q       O       0.295
S       O       0.233
E       O       0.179
D       O       0.164
K       O       0.177
K       O       0.211
W       O       0.217
F       O       0.286
T       O       0.354
P       O       0.432
R       O       0.455
E       O       0.479
F       O       0.478
E       O       0.485
I       D       0.506
E       D       0.533
G       D       0.546
D       D       0.533
R       O       0.468
G       O       0.473
A       O       0.446
S       O       0.443
K       O       0.474
N       O       0.459
W       O       0.453
K       O       0.393
L       O       0.333
S       O       0.317
I       O       0.292
R       O       0.285
C       O       0.277
G       O       0.199
G       O       0.134
Y       O       0.053
T       O       0.039
L       O       0.058
K       O       0.059
V       O       0.084
L       O       0.091
M       O       0.093
E       O       0.147
N       O       0.203
K       O       0.253
F       O       0.303
L       O       0.391
P       O       0.499
E       D       0.569
P       D       0.653
P       D       0.743
S       D       0.758
T       D       0.741
R       D       0.715
K       D       0.724
K       D       0.648
V       O       0.375
T       O       0.290
I       D       0.626
K       O       0.288
END


