Genome 540 Homework Assignment 1
Due Friday Jan. 16
Policy on late homework: It will be accepted, but penalized.
- Read The evolutionary origin of complex features. R.E. Lenski, C. Ofria, R.T. Pennock, and C. Adami. Nature 423 (2003) 139-145
- Download and begin reading Initial sequencing and analysis of the human genome. The Genome
International Sequencing Consortium. Nature 409, 860-921 (15
February 2001) . (To print this out, I would recommend print the
pdf format which corresponds exactly to the printed version, rather
than the html format.) For next week, read
- introduction and background (pp. 860-863, up to but not including "Strategic issues")
- the section "Broad genomic landscape" (pp. 875-879, up to but not including "Repeat content of the human genome")
- the section "Gene content of the human genome" (starting p. 892) up to but not including "comparative proteome analysis" (p. 901).
- Spend an hour or two exploring the NCBI web site, following as many links as possible,
reading as much material as you can, and getting an idea of the overall
structure of the site.
- Find a bacterium or archaeon for which the complete genome
sequence is available on that site and at least 500,000 bases in
length, and for which one of the organism's initials (i.e. the first letter of its first name, or the first letter of its last
name) is the same as one of your initials
(if none of the organisms has initials meeting this condition, choose one at random).
For this organism, find a file in "FASTA" format (i.e. having a header
line which starts with the character ">" and includes the organism name,
with the sequence itself following on subsequent lines) containing the
complete genome sequence; this file will have a name with the
extension ".fna". Download this file.
- Write a program which reads in the file you downloaded in 3,
counts all the nucleotides of each type (i.e. the number of A's, the number of C's, etc. including ambiguously coded
ones (N,R,Y, etc), if any) in the sequence; and prints out
- the name of the file (this should be the same as the file name on the NCBI web site -- i.e. don't rename it!)
- the header line of the file
- a table indicating the nucleotide counts, and the total number of nucleotides.
- Email the output from running your program to me at phg@u.washington.edu AND to Chris Saunders . Please
make it as compact as possible. Do NOT send the code itself. Include
the output in the body of your email message (as plain text), NOT as
an attachment.
- (Not to hand in -- this is a test of whether your basic programming skills will be adequate for future assignments): Write a program that generates 5 million random numbers between 0 and 1 and sorts them by increasing size. It should take you no more than 1/2 hour to write this program, and it should run in a few minutes or less.