MBT 540 Homework Assignment 5

Due Friday Feb. 22

  1. Write a program to find a highest-weight path in a weighted directed acyclic graph. This program should
  2. Write a program that reads in three input DNA sequences (from a FASTA file), creates the edit graph for the three sequences, and outputs this graph in a file in the format described in 1. Recall that, in the edit graph, vertices correspond to triples (i,j,k) where 0 <= i <= n1, 0<=j<=n2, and 0 <= k <= n3, where n1, n2 and n3 are the lengths of the three sequences. There is an edge from (i,j,k) to (i',j',k') whenever i' = i or i+1, j' = j or j+1, and k' = k or k+1, and at least one of the equalities i'=i, j'=j, and k'=k is false. The label you should attach to an edge is the corresponding column of aligned residues & gap characters -- except that you can write it horizontally instead of vertically. For example, the label associated to the edge from (10,37,5) to (11,37,6) would be A-C if A is the 11th residue in the first sequence and C is the 6th residue in the third sequence. Note that different edges in the graph can have the same label.

    The weights attached to an edge should be as follows:

    in the case that there are no gap characters:
      +5 if all three residues are the same
      +3 if two of the three are the same, and the other is different
       0 if all three are different
    in the case that there is exactly one gap character: 
      +2 if both residues are the same
       0 if they are different
    in the case that there are two gap characters: +1
    
  3. Run your program 2 above to produce an output file giving the edit graph for the following three sequences:
    >seq1
    CTGTGACTTCATTCATTTGAGACTGTATCCTGTCATCAGGCAA
    >seq2
    CGTGGAGTATTTGCATCAGTGGCGTGCCGGTTAGATGACAAG
    >seq3
    CGTATAGGTTTACCGGCGGGAGGATGGTATAGCGTAA
    
    Then run your program 1 above on this file to produce a highest scoring path (and its score) giving a highest-scoring alignment of the three sequences; & send me the results at phg@u.washington.edu.