Genome 540 Homework Assignment 7
Due Sunday Feb 27
- Using the same HMM and dataset as in homework 6, write a program that implements EM (Baum-Welch) training instead.
Use the same starting parameter values, but in contrast to homework 6, you should not hold any parameter values fixed -- allow all of them to change with each iteration. Compute the log-likelihood (to the base 2) of the sequence at each iteration, and run the program until the increase in loglikelihood between successive iterations becomes less than .1. You should check that the loglikelihood increases with each iteration -- if it doesn't, something is wrong with your program.
- Your output should provide
- the name and first line of the .fna file
- the number of iterations until convergence
- the final loglikelihood
- the final emission and transition probabilities
- You must turn in your results and your computer
program, using the template (file format) described below.
Please put everything into ONE plain text file - do not send an archive
of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if
you don't have access to either of these programs let us know), and
send it as an attachment to both Phil at phg@u.washington.edu, and
Tobias at mann@gs.washington.edu.
Here is the template.
first line of the .fna file that you use for
viterbi training. NOTE: please do not change the filename or the first line, in
any way. Please include all characters
of the first line (including the '>' character).
put the number of iterations it took for the algorithm
to converge here.
put the final log likelihood here.
1,2
1=0.900,2=0.100
1=0.990,2=0.010
1=0.200,2=0.800
A=.250,C=.250,G=.250,T=.250
A=.300,C=.250,G=.250,T=.200
put comments about your code here
file contents here