Assignment 9, due Sunday Mar 12
SYLLABUS & LECTURE SLIDES:
Math Notation
Nature paper on Avida
Avida web site
Nature paper on human genome sequence
Nature paper on mouse genome sequence
Siepel et al. paper on PhyloHMMs & sequence conservation
Rabiner tutorial on HMMs
HMM scaling tutorial (Tobias Mann)
Supervised learning tutorial
- Biological Review : Gene and genome structure in prokaryotes and eukaryotes; the genetic code & codon usage; "global" genome organization. Sources and characteristics of sequence data; Genbank and other sequence databases.
- Lecture 1: Course overview.
- Lecture 2: Finding exact matches in sequences using suffix arrays. Algorithmic complexity. Directed graphs. Reading: Durbin et al. section 2.1, 2.2, 2.3.
- Discussion Section 1: HW1 and general programming tips.
- Lecture 3: Depth structure of directed acyclic graphs (DAGs); trees and linked lists. Dynamic programming on weighted DAGs. Reading: Durbin et al. 2.4, 2.5, 2.6.
- Lecture 4: Dynamic programming on weighted DAGs. Maximal-scoring sequence segments. Reading: Durbin et al. 6.1, 6.2, 6.3; Ewens & Grant 1.1, 1.2, 1.12, 3.1, 3.2, 3.4, 3.6, 5.2, 9.1, 9.2
- Discussion Section 2: HW1, HW2, minimum path algorithms, and memoization.
- Lecture 5: Maximal-scoring sequence segments. Reading: Ewens & Grant 5.3.1, 5.3.2, 12.1, 12.2, 12.3; Durbin et al. chapter 3
- Lecture 6: D-segments, relationship to 2-state HMMs. Sequence alignment.
- Discussion Section 3: HW1 comments, HW2 questions, D-segment algorithm, and coding guidelines.
- Lecture 7: Edit graphs & sequence alignment. Smith-Waterman algorithm. Needleman-Wunsch algorithm. Local vs. global. Reading: Ewens & Grant 12.2, 12.3, 1.14, Appendix B.10; Durbin et al. chapter 3
- Lecture 8: Multiple sequence alignment. Linear space algorithms.
- Discussion Section 4: HW2 comments, HW3 questions, edit graph optimization, and data structures.
- Lecture 9: Better scoring models: General & affine gap penalties; profiles. Smith-Waterman special cases.
- Lecture 10: Word nucleation approaches/BLAST. Probability models on sequences; review of basic probability theory: probability spaces, conditional probabilities, independence. Failure of equal frequency assumption for DNA.
- Discussion Section 5: HW4 questions, BLAST, and the Stack vs. the Heap.
- Lecture 11: Site models. Site model examples: 3' splice sites, 5' splice sites, protein motifs. Site probability models. Comparing alternative models. Neyman-Pearson lemma. Weight matrices for site models. Weight matrices for splice sites in C. elegans.
- Lecture 12: Score distributions. Limitations of site models (variable spacing, non-independence). Hidden Markov Models: introduction Reading: Siepel et al.
- Discussion Section 6: HW5 tips and questions, motif-finding algorithms, and valgrind.
- Lecture 13: Hidden Markov Models: formal definition. HMM examples: -- splice sites; 2-state models; 7-state prokaryote genome model.
- Lecture 14: Probabilities of sequences. Computing HMM probabilities via associated WDAG.HMM Parameter estimation: Viterbi training.
- Discussion Section 7: Viterbi training, HW6 tips, GENSCAN, and amortized analysis.
- Lecture 15: Forward-backward algorithm. Baum-Welch (EM) algorithm; techniques for finding global maxima in likelihood surface.
- Lecture 16: Detection of evolutionarily conserved regions using Phylo-HMMs.
- Discussion Section 8: Baum-Welch, alternative update calculations, and NP-complete proofs.
- Lecture 17: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd).
- Lecture 18: Detection of evolutionarily conserved regions using Phylo-HMMs (cont'd). Distribution of maximal segment scores in background sequence. Karlin-Altschul theory.
- Discussion Section 9: HW 8, Expectation-Maximization, and Markov Chain Monte Carlo (MCMC).
- Lecture 19(written portion notes): Brief introduction to microbiome research and a simple algorithm for designing minimal microbial communities with a specified metabolic capacity.
- Lecture 20: Karlin-Altschul theory (cont'd). Information theory.
- Discussion Section 10: HW 9, regular expressions, sed, AWK, and Git (solo and collaborative).
C/C++ PROGRAMMING GUIDES:
OTHER RELEVANT COURSES AT UW:
COMPUTATIONAL BIOLOGY COURSES AT OTHER SITES: