
PFRMAT DR
TARGET T0126
AUTHOR 1438-8319-9551
REMARK Predictions are for lack of fixed tertiary structure, 
REMARK that is, for "natively unfolded" or "intrinsically 
REMARK unstructured" segments. 
REMARK O=ordered, structured, folded
REMARK D=disordered, unstructured, unfolded
REMARK Last column is NOT probablity per se.  PONDR is a neural 
REMARK network predictor which produces a score.  The cutoff 
REMARK for Disorder is indicated by a score greater than 0.5.  
REMARK Studies have shown that neural network scores are roughly 
REMARK equivalent to probabilities. Short strings of amino acids 
REMARK predicted to be disordered occur more often due to chance 
REMARK than long strings of predicted disorder.
METHOD 
METHOD PONDR is a neural network Predictor of Natural Protein 
METHOD Disordered Regions.  The predictor used here is an integration
METHOD of three predictors, one for each termini and one for internal
METHOD sequences (Romero, Obradovic, Li, Garner, Brown and Dunker, in 
METHOD press, Proteins: Structure, Function, Genetics).  For the 
METHOD internal sequences, a training set of 15 disordered regions 
METHOD having a total of 1149 residues was compiled and balanced by an 
METHOD equal number of ordered residues taken randomly from NRL_3D. Of 
METHOD the 15 disordered regions in the training set, 8 were 
METHOD characterized by X-ray diffraction (PDB IDs: 2tbv, 2ts1, 1aui, 
METHOD 1bgw, 1elo, 1af3, 1ati and 1lbh) and 7 by NMR (SW IDs: prio_mouse,
METHOD h5_chick, flgm_salty, regn_lambd, hsf_klula, and hmgi_human, and 
METHOD PIR accession: S50866).  
METHOD 
METHOD From an initial pool of 31 attributes, a branch and bound search 
METHOD was used to select 10 attributes that gave the best collective 
METHOD discrimination between the order and disorder in the training 
METHOD set using a Mahalanobis distance criterion. The 31 attributes in 
METHOD the initial pool included the 20 amino acid compositions, two 
METHOD different hydropathy scales, flexibility index, alpha-moment, 
METHOD beta-moment, net charge (K + R - D - E), aromatic composition  
METHOD (W + F + Y), coordination number, codon number, alphabet size, 
METHOD and side chain volumes. The attributes selected by this process 
METHOD were fraction of W, Y, F, D, E, K, R, aromatic composition, 
METHOD coordination number, and net charge.
METHOD 
METHOD The back-propagation learning algorithm was used to train a 
METHOD feedforward neural network having the ten selected attributes as 
METHOD inputs, a fully connected hidden layer of ten neurons and a 
METHOD single output. To estimate errors, the training was repeated on 
METHOD 5 disjoint subsets each having 80% of the data with 3 different 
METHOD initializations, so neural network training was repeated 5 x 3 = 
METHOD 15 times. Once the accuracy was established by this 5-cross 
METHOD validation procedure, a new neural network was trained to the 
METHOD same accuracy using all the data.
METHOD  
METHOD To enable prediction from the first to the last residue in a 
METHOD protein, disorder was partitioned according to position, with 
METHOD the development of different predictors for N-terminal, and 
METHOD C-terminal regions (Li X, Romero P, Rani M, Dunker AK, Obradovic 
METHOD Z. Predicting protein disorder for N-, C-, and internal regions. 
METHOD Genome Informatics 1999;10:30-40).  These predictors used 8 
METHOD inputs.
METHOD 
METHOD The integration of the three predictors was carried out in 3 
METHOD steps. First, predictions were made by the three predictors over 
METHOD their respective domains, with overlapping predictions for 
METHOD positions 11 - 14 by the N-terminal and internal predictors, and, 
METHOD for a protein of length M, with overlapping predictions from M-14 
METHOD to M-11 by the C-terminal and internal predictors. Second, the 
METHOD values for each of the 4 pairs of overlapping prediction were 
METHOD averaged.  Third, the now integrated prediction outputs were 
METHOD smoothed by averaging over sliding windows of 9 amino acids, 
METHOD with the first and last 4 sequence positions being assigned the 
METHOD unsmoothed prediction output values from the N- and C-terminal 
METHOD predictors, respectively. This integrated predictor is used 
METHOD herein. 
REMARK
REMARK The N-terminus to E50 is predicted to be disordered.  The region 
REMARK of disorder from A120 to F125 is too short to be significant.
REMARK
MODEL  1
M D 0.997
A D 0.991
E D 0.989
D D 0.982
G D 0.967
P D 0.959
Q D 0.942
K D 0.911
Q D 0.881
Q D 0.847
L D 0.827
E D 0.807
M D 0.761
P D 0.703
L D 0.629
V D 0.592
L D 0.571
D D 0.551
Q D 0.525
D O 0.491
L O 0.448
T O 0.434
Q O 0.449
Q D 0.523
M D 0.579
R D 0.631
L D 0.683
R D 0.746
V D 0.808
E D 0.880
S D 0.945
L D 0.990
K D 0.998
Q D 0.998
R D 0.998
G D 0.996
E D 0.996
K D 0.996
K D 0.992
Q D 0.991
D D 0.989
G D 0.973
E D 0.928
K D 0.891
L D 0.838
I D 0.771
R D 0.704
P D 0.660
A D 0.602
E D 0.507
S O 0.419
V O 0.364
Y O 0.301
R O 0.245
L O 0.202
D O 0.158
F O 0.096
I O 0.043
Q O 0.028
Q O 0.022
Q O 0.011
K O 0.000
L O 0.000
Q O 0.000
F O 0.000
D O 0.000
H O 0.000
W O 0.000
N O 0.000
V O 0.000
V O 0.000
L O 0.003
D O 0.010
K O 0.029
P O 0.103
G O 0.118
K O 0.128
V O 0.145
T O 0.177
I O 0.255
T O 0.332
G O 0.377
T O 0.375
S O 0.322
Q O 0.324
N O 0.321
W O 0.309
T O 0.290
P O 0.223
D O 0.155
L O 0.104
T O 0.088
N O 0.067
L O 0.052
M O 0.045
T O 0.045
R O 0.037
Q O 0.035
L O 0.042
L O 0.057
D O 0.077
P O 0.101
A O 0.142
A O 0.145
I O 0.148
F O 0.153
W O 0.164
R O 0.160
K O 0.149
E O 0.139
D O 0.117
S O 0.080
D O 0.118
A O 0.193
M O 0.274
D O 0.354
W O 0.417
N O 0.462
E O 0.494
A D 0.544
D D 0.583
A D 0.570
L D 0.551
E D 0.544
F D 0.530
G O 0.496
E O 0.454
R O 0.419
L O 0.372
S O 0.337
D O 0.308
L O 0.248
A O 0.172
K O 0.089
I O 0.045
R O 0.039
K O 0.035
V O 0.031
M O 0.025
Y O 0.026
F O 0.026
L O 0.020
I O 0.019
T O 0.022
F O 0.022
G O 0.050
E O 0.093
G O 0.143
V O 0.183
E O 0.243
P O 0.290
A O 0.330
N O 0.360
L O 0.387
K O 0.387
A O 0.376
S O 0.353
V O 0.338
V O 0.297
F O 0.307
N O 0.270
Q O 0.233
L O 0.187
END


