PFRMAT TS 
TARGET T0043 
AUTHOR 9070-5088-8627 
REMARK  
REMARK Prediction date: Tuesday June 23, 1998 
REMARK Group name: UCSC-compbio 
REMARK Authors: Christian Barrett, Melissa Cline, Mark Diekens, Kevin Karplus, 
REMARK 	 David Haussler and Richard Hughey 
REMARK University of California, Santa Cruz 
REMARK  
METHOD Overview 
METHOD  
METHOD Fold recognition was performed using the Target98 (SAM-T98) method 
METHOD [3] using SAM version 2.1.1 [1], a refinement of the methods developed 
METHOD by this group for CASP2 [2].  This method attempts to find and multiply  
METHOD align a set of homologs to a given sequence, then create an HMM from that  
METHOD multiple alignment. 
METHOD  
METHOD First, a set of sequence weights is determined from the alignment.  Next,  
METHOD Modelfromalign is used to build the model from the alignment and the  
METHOD sequence weights.  Finally, hmmscore performs a local, all-paths scoring  
METHOD of the sequences, using a reversed-sequence normalization feature. 
METHOD  
METHOD The weighting method, detailed in upcoming publications [3,4], 
METHOD combines the Henikoffs' scheme [5], Dirichlet mixtures [6], and an 
METHOD entropy method to set the final weights. 
METHOD  
METHOD Alignment generation 
METHOD  
METHOD The initial step uses BLASTP to search NRP twice: once to produce a set 
METHOD of very close homologs, and once to produce a set of possible homologs. 
METHOD  
METHOD The method then uses multiple iterations of a selection, training, and  
METHOD alignment procedure.  Each iteration involves an initial alignment, a set  
METHOD of search sequences, a threshold value, and a transition regularizer.  
METHOD  
METHOD The first iteration uses a single sequence (or seed alignment) as the  
METHOD initial alignment and the close homologs found by BLASTP are used as the  
METHOD search set.  The threshold is set very strictly, so that only good matches  
METHOD to the sequence are considered.  This iteration uses a transition regularizer  
METHOD that was designed to match the gap costs used by BLASTP. 
METHOD  
METHOD On subsequent iterations the input alignment is the output from the 
METHOD previous iteration, the search set is the larger set of possible 
METHOD homologs found by BLASTP, and the thresholds are gradually loosened. 
METHOD The second through second-from-last iteration use a ``long-match'' 
METHOD transition regularizer, and the final iteration uses a transition regularizer  
METHOD trained on FSSP alignments. 
METHOD  
METHOD References 
METHOD [1] R. Hughey and A. Krogh, CABIOS 12(2): 95-107, 1996. 
METHOD     http://www.cse.ucsc.edu/research/compbio/sam.html.   
METHOD [2] K. Karplus, K. Sjolander, C. Barrett, M. Cline, D. Haussler, R. 
METHOD     Hughey, L. Holm, and C. Sander, Proteins: Structure, Function, and  
METHOD     Genetics, Suppl. 1, 134-9, 1997. 
METHOD [3] K. Karplus, C. Barrett, and R. Hughey, Technical Report UCSC-CRL-98-06, 
METHOD     Department of Computer Engineering, Univ. of California, Santa Cruz, 1998. 
METHOD [4] J. Park, K. Karplus, C. Barrett, R. Hughey, D. Haussler, T. Hubbard, 
METHOD     and C. Chothia, http://cyrah.med.harvard.edu/~jong/assess_final.html, 1998. 
METHOD [5] S. Henikoff and J. C. Henikoff, JMB, vol 243, pp 574-578, Nov 1994. 
METHOD [6] K. Sjolander, K. Karplus, M. P. Brown, R. Hughey, A. Krogh, I. S. 
METHOD    Mian, and D. Haussler, CABIOS 12(4):327-345, 1996. 
METHOD  
METHOD  
METHOD The best scoring hits for target 43 were as follows:  
METHOD         1ris 
METHOD         1enj and its homologs 2end, 1casA, and 1eni  
METHOD 	1bv1 and its homolog 1btv,  
METHOD 	1aorA 
METHOD However, the best scores were a modest -4.34 and -4.21.  In that range,  
METHOD the number of false positives outnumber the true positives at a rate of  
METHOD approximately 8 to 1.  When we tried loosening the thresholds for 
METHOD building the target model, we had an additional hit to 1gclA and 1gmcA.   
METHOD  
METHOD The 1gcmA alignment suggests that EELLNHTQRIELQQGRVRK is a long helix. 
METHOD The somewhat better alignment to 2end suggests that the helix extends 
METHOD a bit further:  
METHOD  
METHOD T0043	VALETSLAPEELLNHTQRIELQQGRVRKAERWGPRTLD 
METHOD 	LSLGGGSLHHHHHHHHHHTTHHHHHHHHHHHTTLLGGG 
METHOD 2end	LTLVSELADQHLMAEYRELPRVFGAVRKHVANGKRVRD	 
METHOD  
METHOD (There is a squished place at a proline in the 2end helix, which DSSP 
METHOD labels as a turn, but there is no change in the axis of the helix 
METHOD there, and T0043 lacks the proline, so most likely has a single 
METHOD continuous helix.) 
METHOD  
METHOD The 1ris alignment, on the other hand, suggests a strand for the 
METHOD second half of what the 2end alignment predicts as a helix:  
METHOD  
METHOD T0043	NHTQRIELQQGRVRKAERWGPRTLDLDI. 
METHOD 	HHHHHHHHTTLEEEEEEEEEEEEEEEEE 
METHOD 1ris	IIQRALENYGARVEKVEELGLRRLAYPI 
METHOD  
METHOD None of these remote-homolog hits are sufficient to make a fold 
METHOD prediction.  The residue identities are fairly large for both the 2end 
METHOD and 1ris alignments, so even secondary structure prediction is 
METHOD difficult from these alignments.  We favor the long-helix 
METHOD interpretation, because the proline (in WGPRT) seems more likely to be 
METHOD at the end of a helix than in the middle of a beta strand.  
METHOD  
METHOD The sequence RVRKAER does seem to be a good candidate for a 
METHOD "chameleon" sequence that can be either helix or strand depending on 
METHOD environment. 
METHOD  
MODEL  1 
PARENT NONE 
TER 
END 
