Genome 540 Homework Assignment 7
Due Wednesday Mar 6, 11:59pm
- Using the same HMM and dataset as in homework 6, write a program that implements EM (Baum-Welch) training instead.
Use the same starting parameter values, but in contrast to homework 6, you should not hold any parameter values fixed -- allow all of them to change with each iteration. Compute the log-likelihood (to the base 2) of the sequence at each iteration, and run the program until the increase in log-likelihood between successive iterations becomes less than 0.1. You should check that the log-likelihood increases with each iteration -- if it doesn't, something is wrong with your program. Your program should take less than 250 iterations to converge.
- Your output should provide:
- the name and first line of the .fna file
- the number of iterations until convergence
- the final log-likelihood
- the final initial, emission, and transition probabilities
-- please output these in scientific notation, to four significant digits (i.e., 9.000e-1)
- You must turn in your results and your computer
program, using this template file.
Please put everything into ONE plain text file - do not send an archive
of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if
you don't have access to either of these programs let us know), and
send it as an attachment to both Phil (phg (at) u.washington.edu) and Eliah (eliah (at) uw.edu).