Genome 540 Homework Assignment 6
Due Sunday Feb 22
- Using the same HMM and dataset as in homework 5, write a program that implements EM (Baum-Welch) training instead.
Use the same starting parameter values, but in contrast to homework 5, you should not hold any parameter values fixed -- allow all of them to change with each iteration. Compute the log-likelihood (to the base 2) of the sequence at each iteration, and run the program until the increase in log-likelihood between successive iterations becomes less than .1. You should check that the loglikelihood increases with each iteration -- if it doesn't, something is wrong with your program.
- The Viterbi algorithm used in the previous homework involved only multiplication of probabilities, so it was possible to do all computation in log space simply by replacing multiplication with addition.
However, the Baum-Welch algorithm involves addition of probabilities.
See this document for a discussion of how to handle this.
Briefly, there are two ways to avoid underflow: (1) multiply the probabilities at each position by a constant factor and (2) use log probabilities and sum probabilities in a way that does not involve taking the exp() of an extremely small number.
We recommend method (2).
In order to compute the sum of two log probabilities B and A, replace exp(A)+exp(B) (which would underflow) with A+log(1+exp(B-A)) (which is much less prone to underflow).
You must also be careful to handle log(0) appropriately.
- Your output should provide
- the name and first line of the .fna file
- the number of iterations until convergence
- the final log-likelihood (three decimal places).
- the final emission and transition probabilities
-- please output these in scientific notation, to four significant digits (i.e., 9.000e-1)
- You must turn in your results and your computer
program, using this template file .
Please put everything into ONE plain text file - do not send an archive
of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if
you don't have access to either of these programs let us know), and
send it as an attachment to both Phil (phg (at) u.washington.edu) and
Max (maxwl (at) cs.washington.edu). Name your file
"[your first initial][your last name]_hw5.txt.[compression extension (i.e. .gz)]".