Genome 540 Homework Assignment 2

(Winter Quarter 2022)

Due Sunday Jan 23, 11:59pm

  1. Write a program that generates the FASTA files of two simulated sequences. This program should:
  2. The sequence names for your output sequences should indicate that they are simulated (and the model used). Note that the nucleotide and dinucleotide counts in your simulated sequences may differ slightly from those in the input, due to the randomization process.

  3. Use your program in #1 above to generate two simulated FASTA files based on the length and nucleotide or dinucleotide frequency of the 10-megabase mouse genomic region used in homework 1.

    Your output should include the information below for each of the three fasta files (the original and two simulated)

  4. All frequencies should be given to 4 decimal places; and you should use the nucleotide order A, C, G, T for the matrix rows and columns.

  5. Run your program from homework 1 twice. In both runs, sequence 1 should be the 10-megabase human genomic region used in homework 1. In the first run, sequence 2 should be the order-0 Markov model simulated sequence (from # 2 above). In the second run, sequence 2 should be the order-1 Markov model simulated sequence. Your output for each run should be the same format as for homework 1.
  6. Answer this question after the simulated histogram, as shown in the template file:
  7. You must turn in your results and your computer program (only the program in #1 above -- you don't need to turn in your HW1 program again!), using this file as a template. Please put everything into ONE file - do not send an archive of files or a tar file. After creating a plain text file (NOT a word processing document file) in this format, compress it (using either Unix compress, or gzip -- if you don't have access to either of these programs, let us know), and send it as an attachment to both Phil at phg@uw.edu and CX at cxqiu@uw.edu.