Assignment 9, due Sunday Mar. 13
SYLLABUS & LECTURE SLIDES:
Math Notation
Nature paper on Avida
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Programming Review (1st discussion section) ------ Rupali's slides
- Lecture 1: Finding exact matches in sequences. Living organisms as imperfect replication machines; theory of evolution & tree of life; 'artificial life';
- Lecture 2: Mutations as molecular basis for evolutionary change. CpG mutations/CpG islands. Segmental changes. Mutation fates. Neutral/nearly neutral theories. Mutation & substitution rates.
- Lecture 3: Mutation & substitution rates. Overview of goals & experimental approaches of molecular biology; role of sequence analysis. Generalities on algorithms for biological data; directed graphs. Depth structure of directed acyclic graphs (DAGs); trees and linked lists. Dynamic programming on weighted DAGs. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Lecture 4: Dynamic programming on weighted DAGs. Algorithmic complexity. Maximal-scoring sequence segments. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 5: Edit graphs & sequence alignment. Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Multiple sequence alignment; edge weight issues. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12
- Lecture 6: Linear space algorithms. General & affine gap penalties. Profiles.Smith-Waterman special cases, self-similarity. Reading: Ewens & Grant 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Lecture 7: Speedups based on nucleating word matches: BLAST, FASTA, cross_match. Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Comparing alternative models. Failure of equal frequency assumption for DNA. Site models. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 8: Site models; examples: 3' splice sites, 5' splice sites, protein motifs. Site probability models. Comparing alternative models. Weight matrices for site models. weight matrices for splice sites in C. elegans. Score distributions. Limitations of site models (variable spacing, non-independence).Reading: Ewens & Grant 12.2, 12.3; Durbin et al. chapter 3
- Lecture 9: Hidden Markov Models: introduction; formal definition; HMM examples: -- splice sites; 2-state models.Reading: Ewens & Grant 1.14, Appendix B.10.
- Lecture 10: HMM examples: 7-state prokaryote genome model: Probabilities of sequences; computing HMM probabilities via associated WDAG.
- Lecture 11: HMM Parameter estimation: Viterbi training, Baum-Welch (EM) algorithm; specialized techniques. Detection of evolutionarily conserved regions using Phylo-HMMs. Reading: Siepel et al.
- Lecture 12: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Rupali's lecture on BWT/BWA
- Lecture 13: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 14: Multiple alignment via profile HMMs. Information theory: Entropy. Information inequality. Boltzmann distribution,
- Lecture 15: Coding theory/data compression, uniquely decodable codes. Kraft inequality, entropy & expected code length. Information; relative entropy. Relative entropies of site models.
- Lecture 16: Relative entropies of site models. Sequence logos. Random variables; exact probability distribution for weight matrix scores.
- Lecture 17: Exact probability distribution for weight matrix scores. Maximal scoring segments. D-segments, relationship to 2-state HMMs.
- Lecture 18: Karlin-Altschul theory, exact probability dist'ns for segment scores.
- Lecture 19: Karlin-Altschul theory.
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES:
- Computational Molecular Biology (Washington University)
- Computational Genomics (Ron Shamir, Tel Aviv University)
- Computational Molecular Biology (Doug Brutlag & Lee Kozar, Stanford)
- Computational Genomics (Doug Brutlag, Stanford)
- Representations and Algorithms for Computational Molecular Biology (Russ Altman, Stanford)
- Computational Biology: Genomes, Networks, Evolution (James E. Galagan and Manolis Kellis, MIT)
- Computational Genomics (Adam Siepel, Cornell)
- Computational Biology (Robert Murphy, Carnegie Mellon)
- Introduction to Computational Biology (Steven Skiena, Stony Brook)