Genome 540 Homework Assignment 6
Due Thursday Feb 25
- Using the same HMM and dataset as in homework 5, write a program that implements EM (Baum-Welch) training instead.
Use the same starting parameter values, but in contrast to homework 5, you should not hold any parameter values fixed -- allow all of them to change with each iteration. Compute the log-likelihood (to the base 2) of the sequence at each iteration, and run the program until the increase in log-likelihood between successive iterations becomes less than .1. You should check that the loglikelihood increases with each iteration -- if it doesn't, something is wrong with your program.
- Your output should provide
- the name and first line of the .fna file
- the number of iterations until convergence
- the final log-likelihood
- the final emission and transition probabilities
-- please output these in scientific notation, if possible
- You must turn in your results and your computer
program, using this template file .
Please put everything into ONE plain text file - do not send an archive
of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if
you don't have access to either of these programs let us know), and
send it as an attachment to both Phil and Alan.
(The XML file includes a DTD, which specifies the XML file format. Place the DTD at the beginning of your XML document. When you are done with your XML check to make
sure it conforms to the DTD using this website and resolve any errors before turning it in.)