|
VC Dimension: Examples and Tools
Carlos C. Rodríguez
https://omega0.xyz/omega8008/
Recall the Definitions
Let A be a class of subsets of Rd. The shatter coefficients of A are
denoted by S (n,A ) and defined as the maximum number of subsets of a set of
n elements that appear in the elemts of A . We say that the class A shatters
a given set of n elements if each of its 2n subsets are picked by the
members of A . The Vapnik-Chervonenkis (VC) dimension of A is the size of the
largest set that can be shattered by A . It is denoted by V=V(A ). Thus,
if V < ¥ we must have S (n,A )
=2n for each n £ V and
S (V+1,A ) < 2V+1. In words: If the VC dimension is V the class A
shatters some sets with V elements but it shatters no set with
V+1 (or more) elements. Here are the simplests examples.
Simplest Examples
- Half lines in R
-
A
= {(-¥,x] : x Î R} . Clearly, S (n,A )
= n+1 and so V=1. Notice,
that,
S (n,A )
= n+1 = |
æ è
|
n
0
|
ö ø
|
+ |
æ è
|
n
V
|
ö ø
|
|
|
- Intervals in R
-
A
= {(a,b): a,b Î R} . To compute S (n,A ) just think of the (n+1) spaces
defined by n different real numbers. Any choice of two of these spaces corresponds
to an (a,b) picking up the points inside. Thus,
S (n,A )
= |
æ è
|
n+1
2
|
ö ø
|
+ 1 |
|
where the extra interval is the one picking none of the n points. Therefore,
V=2 and again,
S (n,A )
= |
n(n+1)
2
|
+1 = |
æ è
|
n
0
|
ö ø
|
+ |
æ è
|
n
1
|
ö ø
|
+ |
æ è
|
n
V
|
ö ø
|
|
|
- Convex Polygons in R2
-
It is easy to see that for this class V=¥. Just think of the n points
lying on the unit circle. Every subset of these n points is picked by the
convex polygon having the elements of the subset as vertices.
To be able to build new examples easily, we show some general properties of
shatter coefficients.
Let A and B be two classes of subsets of Rd and let * be a set
operation. We define,
A (*)B
= { A*B : A Î A , B Î B } |
|
We have,
- S (n,A (c)) = S (n,A ) .
- S (n,A (Ç)B ) £ S (n,A ) S (n,B ).
- S (n,A (È)B ) £ S (n,A ) S (n,B ).
- S (m+n,A ) £ S (m,A )S (n,A ).
- S (n,A (×)B ) £ S (n,A ) S (n,B ).
Proof: The first one is trivial. Let's show the second. Consider a fix
set {x1,¼,xn} of n points. The class A picks the subsets
A1,A2,¼,Ak with k £ S (n,A ) .Now if Aj has nj
elements then B picks at most S (nj,B ) subsets of Aj. Thus,
S (n,A (Ç)B ) £ |
S (n,A ) å
j=1
|
S (nj,B ) |
|
the result follows by noticing that nj £ n. Three follows from one
and two. To show four procceed as above by choosing a set of (m+n) points
with m of the points marked and defining A1,¼,Ak
now with k = S (m+n,A ) and split each Aj as,
with Aj1 containing all of the marked points in Aj. The total
number of sets Aj1 is at most S (m,A ) and the total number
of different sets of type Aj2 is at most S (n,A ) . Thus,
k £ S (m,A )S (n,A ) . The proof for five is similar to the second case.·
More Examples
It is not difficult to generalize the previous examples to Rd.
- South-West Intervals in Rd
-
A
= {(-¥,a1]×(-¥,a2]¼×(-¥,ad] : aj Î R } . By what we found for the case d=1 and property 5 above,
we can deduce that,
S (n,A ) £ (n+1)d and looking ahead we guess V = d. This can be shown
by noticing that A shatters the d cannonical vectors E={e1,¼,ed}
with ej=(0,¼,0,1,0,¼,0) (i.e. all zeroes except a 1 in the
jth entry). Any B Ì E is picked by a member of A , e.g., by the one
with aj = 1+ej/2 where ej = +1 if ej Î B and
ej = -1 otherwise. Furthermore, no set with d+1 points can be
shatter by A since it is imposible to pick only the d points where the
first has the largest first coordinate, the second has the largest second
coordinate, ¼, the dth has the largest dth coordinate, and not the other.
Thus, V=d. ·
- All the rectangles in Rd
-
For this class V = 2d. Again the case d=1 and property 5 above shows that,
S (n,A ) £ |
æ è
|
n(n+1)
2
|
+1 |
ö ø
|
d
|
< (n+1)2d |
|
and looking ahead we'd guess V = 2d. A proof along the lines of the previous
example is straight forward.
Power Tools
The following simple result is known as Sauer's Lemma. It provides a way to
bound shatter coefficients in terms of the VC dimension.
- Sauer's Lemma
-
If V(A ) < ¥, for all n
S (n,A ) £ |
V å
i=0
|
|
æ è
|
n
i
|
ö ø
|
|
|
Proof:
By induction on n. It is true for n £ V since for these n,
S (n,A )
= 2n = |
n å
i=0
|
|
æ è
|
n
i
|
ö ø
|
= |
V å
i=0
|
|
æ è
|
n
i
|
ö ø
|
. |
|
Now let us assume the result true for n and deduce it for n+1.
Clearly S (n+1,A ) is at most 2S (n,A ) (see property 4 above) but
this bound can be reduced by using the fact that the vc dimension is
V so we know that A shatters no set with V+1 points. On the other
hand, a set with n+1 points has n choose V more subsets with
V+1 elements than a set with n points has, and each of these must
hide at least one new subset that cannot be picked out by A . Hence,
applying the induction hypothesis, and the formula for Pascal's
triangle satisfied by the binomial coefficients, we obtain,
| |
|
| |
| |
|
2 |
V å
i=0
|
|
æ è
|
n
i
|
ö ø
|
- |
æ è
|
n
V
|
ö ø
|
|
| |
| |
|
|
V å
i=0
|
|
æ è
|
n
i
|
ö ø
|
+ |
V å
i=1
|
|
æ è
|
n
i-1
|
ö ø
|
|
| |
| |
|
| |
|
The promised bounds follow.
- Corollary
-
If V < ¥, for all n,
and when n ³ V,
S (n,A ) £ |
æ è
|
ne
V
|
ö ø
|
V
|
. |
|
Proof:
We have,
|
V å
i=0
|
|
æ è
|
n
i
|
ö ø
|
£ |
V å
i=0
|
|
ni
i!
|
£ |
V å
i=0
|
|
niV!
i!(V-i)!
|
= (n+1)V |
|
Also, if V/n £ 1,
|
|
æ è
|
V
n
|
ö ø
|
V
|
|
V å
i=0
|
|
æ è
|
n
i
|
ö ø
|
|
|
|
|
V å
i=0
|
|
æ è
|
V
n
|
ö ø
|
i
|
|
æ è
|
n
i
|
ö ø
|
|
| |
| |
|
|
n å
i=0
|
|
æ è
|
V
n
|
ö ø
|
i
|
|
æ è
|
n
i
|
ö ø
|
|
| |
| |
|
| |
|
The other useful tool is,
- Theorem
-
If G is an m-dimensional vector space of real-valued functions defined
on Rd then the class of sets,
A
= { {x : g(x) ³ 0}: g Î G } |
|
has VC dimension V £ m.
Proof:
Take a set of m+1 arbitrary points {x1,¼,xm+1} in Rd. We show
one subset that can never be picked out by A . To define this special subset
consider the linear transformation L from G to Rm+1 defined by,
L(g) = (g(x1),¼,g(xm+1)). |
|
Since the image space LG is of dimension at most m, there must
exist a non-zero vector g Î Rm+1 orthognal to the image
space LG . Thus, there exist scalars g1,¼,gm+1
not all zero such that,
|
m+1 å
i=1
|
gig(xi) = 0 for each g Î G . |
|
Separate the sum above according to whether gi
is negative (i Î {-}) or not (i Î {+}), to obtain,
|
å
{+}
|
gi g(xi) = |
å
{-}
|
(-gi) g(xi). |
|
By replacing g by -g if necessary, we may assume (wlog since
g ¹ 0) that {-} is non-empty
and this shows that it is imposible for A to only pick out the xi with
i Î {+}. For if there was a g Î G with {x:g(x) ³ 0} performing the
trick, it would have to make the lhs of the above equation ³ 0
but the rhs < 0 since {-} is non-empty and g(xi) < 0 for all
i Î {-}.·
File translated from
TEX
by
TTH,
version 3.63. On 5 Oct 2004, 10:13.
|