PFRMAT TS 
TARGET T0052 
AUTHOR 9070-5088-8627 
REMARK  
REMARK Prediction date: Wednesday June 10, 1998 
REMARK Group name: UCSC-compbio 
REMARK Authors: Christian Barrett, Melissa Cline, Mark Diekens, Kevin Karplus, 
REMARK 	 David Haussler and Richard Hughey 
REMARK University of California, Santa Cruz 
REMARK  
METHOD  
METHOD UCSC Computational Biology 
METHOD  
METHOD All experiments were performed using SAM version 2.1.1 [1] using a 
METHOD refinement of the methods used by this group in CASP2 [2].   
METHOD  
METHOD Overview of the method 
METHOD  
METHOD Fold recognition was performed using the Target98 (SAM-T98) method 
METHOD [3].  This method attempts to find and multiply align a set of 
METHOD homologs to a given sequence, then create an HMM from that multiple 
METHOD alignment. 
METHOD  
METHOD First, a set of sequence weights is determined from the alignment.  Next,  
METHOD Modelfromalign is used to build the model from the alignment and the  
METHOD sequence weights.  Finally, hmmscore performs a local, all-paths scoring  
METHOD of the sequences, using a reversed-sequence normalization feature. 
METHOD  
METHOD The weighting method, detailed in upcoming publications [3,4], 
METHOD combines the Henikoffs' scheme [5], Dirichlet mixtures [6], and an 
METHOD entropy method to set the final weights. 
METHOD  
METHOD Alignment generation 
METHOD  
METHOD The initial step uses WU-Blast, BLASTP version 2.0aMP from Washington  
METHOD University, to select the potential homologs from the non-redundant database.   
METHOD NRP is searched twice to produce two sets of homologs: one of very close  
METHOD homologs (E<0.00003) and one of possible homologs (E<500). 
METHOD  
METHOD The target98 method then uses multiple iterations of a selection, 
METHOD training, and alignment procedure.  For each iteration it needs an 
METHOD initial alignment, a set of sequences to search, a threshold value, 
METHOD and a transition regularizer.  Alignments in the library were built 
METHOD with 4 iterations, with thresholds -40, -30, -24, -16, but the target 
METHOD alignment was built with 6, with thresholds -50, -40, -30, -22, -16, and 
METHOD -14. 
METHOD  
METHOD On the first iteration the single sequence (or seed alignment) passed 
METHOD to the method is used as the initial alignment and the close homologs 
METHOD found by WU-BLAST are used as the search set.  The threshold is set 
METHOD very strictly, so that only really good matches to the sequence are 
METHOD considered.  This iteration uses a transition regularizer that was set 
METHOD up to try to match the gap costs used by WU-Blast. 
METHOD  
METHOD On subsequent iterations the input alignment is the output from the 
METHOD previous iteration and the search set is the larger set of possible 
METHOD homologs found by WU-Blast.  The thresholds are gradually loosened. 
METHOD For the second through second-from-last iteration, a ``long-match'' 
METHOD transition regularizer is used, and for the final iteration a 
METHOD transition regularizer trained on FSSP structural alignments is used. 
METHOD  
METHOD References 
METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996. 
METHOD     http://www.cse.ucsc.edu/research/compbio/sam.html.   
METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. 
METHOD     Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and  
METHOD     Genetics, Suppl. 1, 134 9, 1997. 
METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06, 
METHOD     Department of Computer Science, University of California, Santa Cruz, 1998. 
METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, 
METHOD     and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998. 
METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574 578, Nov 1994. 
METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S. 
METHOD    Mian, and D. Haussler, CABIOS, vol 12, pp 327 345, Aug 1996. 
METHOD  
METHOD Results 
METHOD  
METHOD The Target98 method found no homologs in NRP for T52 other than 
METHOD itself, and so the model built from the target98 alignment is not 
METHOD likely to be very powerful in finding remote homologs. 
METHOD  
METHOD The top scoring possible homologs in PDB were as follows: 
METHOD 	chain	score		FOUND by model 
METHOD 	1pmd    -6.21		t52 
METHOD 	1hsq    -4.51		1hsq library model 
METHOD 	1pdgA   -3.25		t52 
METHOD 	1broA   -2.83		t52 
METHOD  
METHOD 1pmd did get a fairly good score, though its structural homologs 1btl, 
METHOD 2bltA, and 3pte did not and -6.2 is in the range where the probability 
METHOD of a match being a false positive is about 70%.  The alignment of T52 
METHOD to 1pmd was only moderately compact and included an unsupported helix 
METHOD at one end.  Also the two known cystine bridges did not map to close 
METHOD positions in 1pmd, so we decided that this match was unlikely to be 
METHOD correct. 
METHOD  
METHOD The alignment of T52 to 1hsq seemed to match only a tiny fragment: 
METHOD 	WQPSNFIE 
METHOD 	WFPSNYVE 
METHOD with a few other very short matches scattered along the chain. 
METHOD This is a motif with a strand and tight turn or short helix.  While it 
METHOD is an interesting bit of secondary structure, it is too small to be 
METHOD suitable for fold recognition.  There is no similar motif in 1pmd. 
METHOD  
MODEL  1 
PARENT NONE 
TER 
END 
