Genome 540 Homework Assignment 6
Due Saturday Feb 18
- Using the same HMM and dataset as in homework 5, write a program that implements EM (Baum-Welch) training instead.
Use the same starting parameter values, but in contrast to homework 5, you should not hold any parameter values fixed -- allow all of them to change with each iteration. Compute the log-likelihood (to the base 2) of the sequence at each iteration, and run the program until the increase in loglikelihood between successive iterations becomes less than .1. You should check that the loglikelihood increases with each iteration -- if it doesn't, something is wrong with your program.
- Your output should provide
- the name and first line of the .fna file
- the number of iterations until convergence
- the final loglikelihood
- the final emission and transition probabilities
- You must turn in your results and your computer
program, using this template file .
Please put everything into ONE plain text file - do not send an archive
of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if
you don't have access to either of these programs let us know), and
send it as an attachment to both Phil and Aaron.
(The XML file includes a DTD, which specifies the XML file format. Place the DTD at the beginning of your XML document. When you are done with your XML check to make
sure it conforms to the DTD using this website: http://www.w3schools.com/dom/dom_validate.asp. and resolve any errors before turning it in.)