Genome 540 Homework Assignment 2

Due Sunday Jan 24, 11:59pm

  1. Write a program that generates a FASTA file of a simulated genome. This program should:
  2. Modify your program from homework 1 to find, for each suffix in the 'forward' strand of genome 1, the length of the longest matching subsequence in genome 2 (or its reverse complement), and to report a histogram of these lengths. Your program should:
  3. Use your program in #1 above to generate a simulated FASTA based on the length and nucleotide frequency of Advenella kashmirensis.
  4. Run your program in #2 above twice. In both runs, genome 1 should be Bordetella bronchiseptica. In the first run, genome 2 should be Advenella kashmirensis. In the second run, genome 2 should be the simulated genome from #3 above.
  5. Your output for each run of your program in #4 above should include:
  6. Answer this question after the simulated histogram, as shown in the template file:
  7. You must turn in your results and your computer programs, using this template file. Please put everything into ONE plain text file - do not send an archive of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if you don't have access to either of these programs let us know), and send it as an attachment to both Phil (phg (at) uw.edu) and Dani (dfaivre (at) uw.edu).