Genome 540 Homework Assignment 7
Due Sunday Feb 29
- Write a program that implements a 2-state HMM for detecting regions of differing nucleotide composition in the genome sequence you used in assignments 1-4. Specifically:
- The starting parameter values should be as follows:
- Transition probabilities a_ij should be a_ii = .8 for i = 1,2, and a_ij = .2 for j not equal to i.
- Initiation probabilities for each state (i.e. the transition probabilities from the 'begin' state into state 1 or 2) should be .5; these can be held fixed throughout the Viterbi training
- Emission probabilities should be
- e_A = e_T = .35, e_G = e_C = .15 for state 1;
- e_A = e_T = .15, e_G = e_C = .35 for state 2.
- Use Viterbi training as described in class to find improved parameter estimates. Run the training for 10 iterations, where for each iteration you:
- Use dynamic programming to find the highest probability underlying state sequence.
- Using this state sequence, compute
- The number of states of each of the two types (1 and 2), and the number of segments of each type (where a segment consists of a contiguous series of states of the same type, that is preceded and followed by states of the opposite type or the beginning or end of the sequence).
- New transition probabilities and new emission probabilities, to be used in the next iteration.
- Your output should provide
- the name and first line of the .fna file
- the information described above (in 2. -- i.e. numbers of states and segments, and new probability values), for each of the 10 iterations. Give probabilities to 3 decimal places only.
- Email the output above to me and Chris. Please make it as compact
as possible. Do NOT send the code itself. Include the output in the
body of your email message (as plain text), NOT as an attachment.