Genome 540 Homework Assignment 3

Due Sunday Jan. 28

  1. Write a program to identify the regions of elevated copy-number using the D-segment algorithm described in lecture, using the number of read-starts at every base to determine the base's score. This program should:
  2. Run your program on this file using the following scoring scheme:
  3. The input file has three columns: chromosome, position, and # of read-starts. The file was created based on the start positions of all reads mapping to chromosome 17 for individual NA12878, sequenced as a part of the 1000 genomes project. Sequencing was performed on the Illumina platform, and the reads were mapped to the human reference hg19 using BWA. The alignments are available on the 1000 genomes project website in BAM format.
  4. Using this example input, your output file should look like this template. Use the same template structure for your output on the actual file. Please put everything into ONE plain text file - do not send an archive of files or a tar file, or a word processing document file. Compress it (using either Unix compress, or gzip -- if you don't have access to either of these programs let us know), and send it as an attachment to both Phil (phg (at) u.washington.edu) and Serena (selenay (at) uw.edu).