This course will give a survey of mathematical methods in statistical modeling, inference, and learning with an emphasis on techniques widely used in speech recognition, computational linguistics and vision. Each lecture is organized around a theorem, algorithm, or technical concepts such as the channel capacity theorem, PAC-Bayesian theorem, the junction tree algorithm, boosting, support vector machines, VC dimensions and manifolds. No prerequisites.
The notes for each lecture contain problems at the end. The problems for the lectures in any given week are due at lecture on the Friday of the following week. There is a 25% grade penalty for late problem sets if they are no more than one week late. Problem sets late by more than one week have a 50% greade reduction.
· Entropy and Data Compression
· Convexity, Jensen's Inequality, and KL Divergence
· Random Variables, Mutual Information, and Channel Capacity
· Hidden Markov Models. Viterbi and Forward-Backward.
· Probabilistic Context Free Grammars (PCFGs). Viterbi and Inside-Outside.
· A*
· Dijkstra Lightest Parse, Generaized A* , Paper on Generalized A*
· Bayesian Networks and Markov Random Fields
· Junction Trees, Tree Width, and the Hammersley-Clifford Theorem
· Min-Cut Segmentation.
· Spectral Clustering.
· The Covariance Matrix. Multivariate Central Limit Theorem. Principle Component Analysis.
· Conjugate Priors. Dirichlett Priors. Gaussian Priors.
· Least Squares Regression. Ridge Regression.
· Kernels.
· Logistic Regression. (more slides)
· MAP EM, Structural EM, Leave-One-Out EM.
· On-line learning.
· Semi-Supervised Regression.