### Due Sunday Feb. 8

1. Write a program to find a highest-weight path in a weighted directed acyclic graph. This program should
• read in an input file that contains a representation of the graph in the following form:
• a list of vertices, with each vertex on a separate line. The line should consist simply of a label (arbitrary character string) for the vertex, which should be unique, i.e. different for different vertices. You may assume that parents precede children in the file, and then use this order in implementing the dynamic programming algorithm.
• a list of edges, with each edge on a separate line. The line should consist of
1. a label for the edge (which need not be unique -- i.e. different edges can have the same label);
2. the label of the edge's beginning vertex;
3. the label of its ending vertex;
4. the numerical weight attached to the edge.
• use dynamic programming to find the highest weight path (with additive, not multiplicative, path weights).
• have 2 output options:
• print the path score, the labels of the beginning and ending vertices on the path, and the labels of the first and last edges in the path; or
• print the path score, and (in order) the labels for all the edges that occur on this path, one label per line. (You will not need this particular option for the current assignment, but you will need it for a later assignment).
2. Write a program that
• reads in 2 input files:
• a FASTA file containing a DNA sequence, and
• a 'scoring scheme' file that indicates a score to be attached to each possible base (A, C, G, T, or other)
• creates the sequence graph for the sequence, as described in lecture (i.e. a 'linked list' with one edge per residue).
• the labels attached to successive vertices should be successive integers, represented as character strings (the first vertex being '0', the second being '1', the third being '2', and so on);
• the label attached to each edge should be the corresponding residue;
• the weight attached to an edge should be the score for that residue, as specified in the scoring scheme file.
• outputs this graph in a file that has the appropriate input format for the program described in #1 above.
3. For the same genome sequence you used in HW assignments 1-3:
1. Run the program in #2 above, using as input files the .fna file for that genome and a scoring scheme file that attaches +1 to an A or T nucleotide, and -2 to a G or C (and assigns 0 to any ambiguity characters in the sequence). Then run your program in #1 above on the resulting output file in order to find the highest scoring segment (= path) of the sequence. Use the first output option (i.e. printing path score, and where the path begins and ends, but not the complete list of edges).
2. Same as above, but using the scoring scheme that attaches +1 to a G or C, and -2 to an A or T.