Genome 540 Homework Assignment 3

Due Sunday Jan. 30

  1. Write a program to find a highest-weight path in a weighted directed acyclic graph. This program should
  2. Write a program that
  3. For a genome sequence: Your final output should include
  4. For the homework assignment, you should run your programs on the genome sequence of Thermococcus kodakaraensis. To test whether your program is working correctly, run it first on the test example Pyrococcus furiosus to see whether you get the right answer (below).
  5. You must turn in your results and your computer program, using the template (file format) described below. Please put everything into ONE plain text file - do not send an archive of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if you don't have access to either of these programs let us know), and send it as an attachment to both Phil at phg@u.washington.edu, and Tobias at mann@gs.washington.edu.
Here is the template: <gs540_hw assignment='3' name='student name' email='student email'> <results> <result type='first line' file='filename'> first line of the .fna file that you use to find the highest scoring segment </result> <result type='nucleotide histogram' file='filename'> Nucleotide histograms should give, for each base or 'ambiguity code' occurring in the sequence, the letter denoting the base, followed by an equals sign, followed by an integer giving the number of times the base occurs in the sequence. Put a comma between the different bases. For instance, A=50,C=50,G=50,T=50,N=2 </result> <result type='DNA sequence'> <location file='filename' strand='forward'> For this assignment, use the forward strand. put the starting base of the highest scoring sequence here followed by a dash and then the ending base in forward strand coordinates. For example: 50-100 </location> <scoring_system> this should look like a nucleotide histogram except that the score values may be any real numbers, e.g. A=1.3,T=1.1,G=-2.7,C=-2.7 </scoring_system> <score> put the score of the highest scoring sequence segment here </score> the dna sequence of the highest scoring segment should go here. </result> </results> <analysis> You should identify the segment (based on checking the genbank annotations) and put your identification here. </analysis> <program> <comments> Any comments about your code or files should go here. </comments> <file name='filename'> file contents here. </file> </program> </gs540_hw> Below is an example of a homework file with the fields filled in with the correct answers for a test case. <gs540_hw assignment='3' name='Tobias Mann' email='mann@gs.washington.edu'> <results> <result type='first line' file='NC_003413.fna'> >gi|18976372|ref|NC_003413.1| Pyrococcus furiosus DSM 3638, complete genome </result> <result type='nucleotide histogram' file='NC_003413.fna'> A=565156, C=388629, T=565106, G=389365 </result> <result type='DNA sequence'> <location file='NC_003413.fna' strand='forward'> 9570-9890 </location> <scoring_system> A=-1.49,T=-1.49,G=0.74,C=0.74 </scoring_system> <score> 50.22 </score> GGCGGCGGGCTAGGCCGGGGGGTTCGGCGTCCCCTGTAACCGGAAACCGCCGATATGCCG GGGCCGAAGCCCGGGGGGCGGTTCCCAAAGCCGCTCCCAGAAGCCGAGGTCGAACGATGA GTCCTCGTCCCGCGGGGTGCCCGGTGGGGGAGGCACGGCTGAAGGGCCGTGCTAACCCCC TTTGGGCCCCGAACCCCGCAAGGCCCGGAAGGGAGCAGCGGTAGGGGCCACGGAGCACGC TCGCGGGGGTGCGGGGATGAGATAGGCCTCGGTGGATGGGAGCGGTGGAGGGTTCCCACC CTCGGGCGTGCCCGCCGCCGC </result> </results> <analysis> This is a structural RNA gene and annotated as a signal recognition particle. The annotated gene starts at 9570 and ends at 9892, including two bases past the end of the maximally scoring segment. The actual sequence ends: CCGCTA. The final 'TA' was not included in the maximal scoring segment because the TA bases would decrease the score. </analysis> <program> <comments> run the bash file. It calls the python script, which writes results to standard out. </comments> <file name='hw2.bash'> python hello_world.py </file> <file name='hello_world.py'> """ This script writes 'hello world' to the standard out """ if __name__ == '__main__': print "hello world" </file> </program> </gs540_hw>