Genome 540 Homework Assignment 6

Due Sunday Feb 20

Here is the template: <gs540_hw assignment='6' name='student name' email='student email'> <results> <result type='first line' file='filename'> first line of the .fna file that you use for viterbi training. NOTE: please do not change the filename or the first line in any way. Please include all characters of the first line (including the '>' character). </result> <result type='viterbi iteration' iteration='1'> This result object should occur 10 times in your homework, one for each iteration. It should give information about states in the Viterbi (highest-probability) path, and the hmm parameter estimates derived from the Viterbi state sequence, and which are used in the next iteration, as indicated below. <result type='state histogram'> State histograms should give, for each state, the state label (1 or 2), followed by an equals sign, followed by the number of positions in the sequence having that state in the Viterbi parse. Put a comma between the entries. For instance, if the sequence has length 8 and the Viterbi parse is 11222111 then your histogram should be: 1=5,2=3 </result> <result type='segment histogram'> Segment histograms should give, for each state, the state label, followed by an equals sign, followed by the number of segments consisting of that state. For instance, for the Viterbi parse above your histogram would be: 1=2,2=1 </result> <model type='hmm'> The model object should specify an HMM by giving state labels, initial state probabilities, state transition probabilities, and symbol emission probabilities. <states> give your state labels, separated by commas: 1,2 </states> <initial_state_probabilities> initial state probabilities should give, for each state, the state label and probability of starting in that state (i.e. the probability of transitioning into that state from the begin state) separated by an equals sign. Entries should be separated by commas. 1=0.900,2=0.100 </initial_state_probabilities> <transition_probabilities state='1'> transition probabilities should give, for each state, the state label and probability of transitioning to that state from the state indicated in the attributes list. The present field (with state='1') gives the probabilities of transitioning from state 1 to states 1 and 2: 1=0.990,2=0.010 </transition_probabilities> <transition_probabilities state='2'> 1=0.200,2=0.800 </transition_probabilities> <emission_probabilities state='1'> For each symbol emitted by the state indicated in the attributes for this field, give the probability of emitting that symbol. A=.250,C=.250,G=.250,T=.250 </emission_probabilities> <emission_probabilities state='2'> A=.300,C=.250,G=.250,T=.200 </emission_probabilities> </model> </result> <result type='segment list'> Give the state 2 segments found in the Viterbi path after the last (10th) iteration, in the form (beginning of 1st segment,end of 1st seg), (beginning of 2d seg,end of 2d seg),... For example, if the Viterbi path is 11112221112111222 you would have (5,7),(11,11),(15,17) </result> </results> <analysis> Find the genbank annotations for the 10 segments closest to the beginning of the genome sequence and report them as follows: <annotation segment='1'> put the genbank annotion of the feature here. You will need 10 of these objects </annotation> </analysis> <program> <comments> put comments about your code here </comments> <file> file contents here </file> </program> </gs540_hw>