|
I hope you find the stuff in this site of help in your quest to learn.
My notes are not copyrighted in any way. In fact you are encouraged to
copy them!
Introduction to Probability
Pattern Recognition
-
Learning Patterns
- Basic definitions, The Bayes Rule, An example, Learning with a teacher,
Definition of consistency, Empirical Risk Minimization and the need for
Uniform Laws of Large Numbers.
- Reference:
- Mostly from "A Probabilistic Theory of Pattern Recognition"
by Devroye, Gyorfi, and Lugosi. 1996. Springer.
Laws of Large Numbers
-
Uniform Laws
- What's a Law of Large Numbers again?, The Weak Law, Markov's and
Tchebychev's inequalities, Chernoff's Method, The Strong Law,
Hoeffding's inequality, The modern Strong Uniform Laws, McDiarmid's
inequality, Enter Combinatorics.
- References: Mostly from "Combinatorial Methods in Density Estimation"
by Devroye and Lugosi. 2001. Springer.
- Classics: Chernoff 1952 with
his method. To know more about the powerful new-age concentration
of measure phenomenon see Talagrand
1996 survey and references in there.
VC Dimension
-
Examples and Tools
- Direct computation of VC dimension for simple classes. Half-lines, and
intervals. General properties of shatter coefficients. South-West sets
and rectangles in dimension d. The simplests (as far as I know) proof
of Sauer's Lemma. Upper bounds of shatter coefficients in terms of
VC dimension. The Steele-Dudley general class.
- References: The ones above plus a bit from Pollard's 1984 book:
"Convergence of Stochastic Proccesses". Springer.
Back to the Beginning: What have we learned?
-
PAC Bounds
- Decomposition of errors. Probably Approximately Correct bounds.
Inversion. PAC bounds for the true risk of an estimator.
The finite case. Interpretation of VC dimension and shatter coefficients
as measures of the effective size of an infinite class.
- References: The ones above plus
Boucheron-Bousquet-Lugosi 2004
Linear Rules and Fat Margins
-
Classification with Hyperplanes
- Here is the line; The ones on this side are +1 the ones on the other
side are -1. Fat lines have better generalization power. Definition of
margin and of plane in canonical position.
- References: SVM and Kernel Methods. Slides by Bernhard Scho"lkopf.
learning-with-kernels.org
Support Vector Machines
-
SVMs
- Finding the hyperplane with largest margin. The Karush-Kuhn-Tucker
(KKT) conditions. The Dual Problem. The Support Vectors of SVMs.
The Quadratic Programming problem of SVMs.
- References: Bernhard Scho"lkopf slides above and,
Tamas Terlaky Notes
on KKT. Examples in Octave.
The Kernel Trick
-
RKHS
- Introduction. Feature Space. Simple Example. Mercer Kernel.
- References: Bernhard Scho"lkopf slides above and,
Aronszajn 1950 classic.
Cucker/Smale 2001
classic 2b.
Regression
-
Regression in Hilbert Space
- Classical Regression. Kernel Regression. SV Regression.
- References: Bernhard Scho"lkopf slides above and,
Maple Code
with an example of
how to use it.
Alex Smola's 1998
Thesis
Kernel PCA
- Any algorithm that depends on the observed data only through its Grammian
(i.e. its matrix of inner products) can be kernelized!
- Classical example: Principal Component Analysis.
- The original paper of
Scho"lkopf, Smola, and Mu"ller still says it best.
|