You must turn in your results and your computer program.
Please put everything into ONE plain text file - do not send an archive
of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if
you don't have access to either of these programs let us know), and
send it as an attachment to both Phil (phg (at) u.washington.edu) and
Alex (aeng (at) uw.edu).
Test sequence 1 (This file contains only one chromosome/contig so you don't need to worry about extracting part of the file.)
Answer for test sequence 1
Test sequence 2 (This file contains only one chromosome/contig so you don't need to worry about extracting part of the file.)
Answer for test sequence 2
Template Details:
- Background Frequency:
Like nucleotide histogram, but giving fraction of times (to 4 decimal places) each nucleotide occurs in the sequence and its complement. In computing these, ignore ambiguity-coded nucleotides.
- Count Matrix:
Put the matrix of nucleotide counts at each position in known translation start sites, in the order: A, C, G, T. Ignore occurrences of ambiguity-coded nucleotides at each position.
- Frequency Matrix:
Like count matrix, but indicating the fraction of times (to 4 decimal places) each nucleotide occurs at each position, rather than the total counts.
- Weight Matrix:
Like frequency matrix, but giving weight. Give values to four decimal places.
- Maximum Score:
The maximum possible score - i.e., the score for the "ideal" sequence.
- Score Histogram CDS:
Two columns, where the first is score rounded down, and the second is the times that score occurs for true start sites. Also include an additional row for all scores less than -50.
- Score Histogram All:
As above, but for all positions in the genome (and its complement).
- Score Histogram All:
A list of positions in the genome where scores g.t.e. 10 occurred but which do NOT correspond to an annotated translation start site. Provide the position (in top strand, origin 1 co-ordinates), strand = 0 (for top) or 1 (for bottom), and score to four decimal places.