Assignment 9, due Sunday Mar. 11
SYLLABUS & LECTURE SLIDES:
Math Notation
Nature paper on Avida
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Programming Review (1st discussion section) -- Jarrett's slides
- Lecture 1: Finding exact matches in sequences. Living organisms as imperfect replication machines; theory of evolution & tree of life; 'artificial life';
- Lecture 2: Mutations as molecular basis for evolutionary change. CpG mutations/CpG islands. Segmental changes. Mutation fates. Neutral/nearly neutral theories. Mutation & substitution rates.
- Lecture 3: Generalities on algorithms for biological data; directed graphs. Depth structure of directed acyclic graphs (DAGs); trees and linked lists. Dynamic programming on weighted DAGs. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Lecture 4: Algorithmic complexity. Maximal-scoring sequence segments. Edit graphs & sequence alignment. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 5: Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Multiple sequence alignment; edge weight issues. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12
- Lecture 6: Linear space algorithms. General & affine gap penalties. Profiles. Reading: Ewens & Grant 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Lecture 7: Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Comparing alternative models. Failure of equal frequency assumption for DNA. Site models. Examples: 3' splice sites, 5' splice sites, protein motifs. Site probability models. Comparing alternative models. Weight matrices for site models. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 8: Weight matrices for splice sites in C. elegans. Score distributions. Limitations of site models (variable spacing, non-independence). Hidden Markov Models: introduction; formal definition. Reading: Ewens & Grant 12.2, 12.3, 1.14, Appendix B.10; Durbin et al. chapter 3
- Lecture 9: HMM examples: -- splice sites; 2-state models; 7-state prokaryote genome model: Probabilities of sequences; computing HMM probabilities via associated WDAG.
- Lecture 10: HMM Parameter estimation: Viterbi training, Baum-Welch (EM) algorithm; specialized techniques. Detection of evolutionarily conserved regions using Phylo-HMMs. Reading: Siepel et al.
- Lecture 11: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 12: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 13: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd). Multiple alignment via profile HMMs. Information theory: entropy.
- Lecture 14: Entropy. Information inequality. Boltzmann distribution, Coding theory/data compression, uniquely decodable codes. Kraft inequality, entropy & expected code length. Information.
- Lecture 15: Relative entropy. Relative entropies of site models. Sequence logos. Random variables; exact probability distribution for weight matrix scores.
- Lecture 16: Exact probability distribution for weight matrix scores. Maximal scoring segments. D-segments.
- Lecture 17: D-segments, relationship to 2-state HMMs. Karlin-Altschul theory.
- Lecture 18: Karlin-Altschul theory; MDL principle and overfitting.
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES:
- Computational Molecular Biology (Washington University)
- Computational Genomics (Ron Shamir, Tel Aviv University)
- Computational Molecular Biology (Doug Brutlag & Lee Kozar, Stanford)
- Computational Genomics (Doug Brutlag, Stanford)
- Representations and Algorithms for Computational Molecular Biology (Russ Altman, Stanford)
- Computational Biology: Genomes, Networks, Evolution (James E. Galagan and Manolis Kellis, MIT)
- Computational Genomics (Adam Siepel, Cornell)
- Computational Biology (Robert Murphy, Carnegie Mellon)
- Introduction to Computational Biology (Steven Skiena, Stony Brook)