Other available formats: [.PDF] [.TEX]

VC Dimension: Examples and Tools

Carlos C. Rodríguez
https://omega0.xyz/omega8008/

Recall the Definitions

Let A be a class of subsets of R^d. The shatter coefficients of A are denoted by S (n,A ) and defined as the maximum number of subsets of a set of n elements that appear in the elemts of A . We say that the class A shatters a given set of n elements if each of its 2ⁿ subsets are picked by the members of A . The Vapnik-Chervonenkis (VC) dimension of A is the size of the largest set that can be shattered by A . It is denoted by V=V(A ). Thus, if V < ¥ we must have S (n,A ) =2ⁿ for each n £ V and S (V+1,A ) < 2^V+1. In words: If the VC dimension is V the class A shatters some sets with V elements but it shatters no set with V+1 (or more) elements. Here are the simplests examples.

Simplest Examples

Half lines in R

A = {(-¥,x] : x Î R} . Clearly, S (n,A ) = n+1 and so V=1. Notice, that,

S (n,A ) = n+1 =

æ
è

n
0

ö
ø

æ
è

n
V

ö
ø

Intervals in R

A = {(a,b): a,b Î R} . To compute S (n,A ) just think of the (n+1) spaces defined by n different real numbers. Any choice of two of these spaces corresponds to an (a,b) picking up the points inside. Thus,

S (n,A ) =

æ
è

n+1
2

ö
ø

+ 1

where the extra interval is the one picking none of the n points. Therefore, V=2 and again,

S (n,A ) =

n(n+1)

+1 =

æ
è

n
0

ö
ø

æ
è

n
1

ö
ø

æ
è

n
V

ö
ø

Convex Polygons in R²

It is easy to see that for this class V=¥. Just think of the n points lying on the unit circle. Every subset of these n points is picked by the convex polygon having the elements of the subset as vertices.

To be able to build new examples easily, we show some general properties of shatter coefficients.

Let A and B be two classes of subsets of R^d and let * be a set operation. We define,

A (*)B = { A*B : A Î A , B Î B }

We have,

S (n,A ^(c)) = S (n,A ) .
S (n,A (Ç)B ) £ S (n,A ) S (n,B ).
S (n,A (È)B ) £ S (n,A ) S (n,B ).
S (m+n,A ) £ S (m,A )S (n,A ).
S (n,A (×)B ) £ S (n,A ) S (n,B ).

Proof: The first one is trivial. Let's show the second. Consider a fix set {x₁,¼,x_n} of n points. The class A picks the subsets A₁,A₂,¼,A_k with k £ S (n,A ) .Now if A_j has n_j elements then B picks at most S (n_j,B ) subsets of A_j. Thus,

S (n,A (Ç)B ) £

S (n,A )
å
j=1

S (n_j,B )

the result follows by noticing that n_j £ n. Three follows from one and two. To show four procceed as above by choosing a set of (m+n) points with m of the points marked and defining A₁,¼,A_k now with k = S (m+n,A ) and split each A_j as,

A_j = A_j1ÈA_j2

with A_j1 containing all of the marked points in A_j. The total number of sets A_j1 is at most S (m,A ) and the total number of different sets of type A_j2 is at most S (n,A ) . Thus, k £ S (m,A )S (n,A ) . The proof for five is similar to the second case.·

More Examples

It is not difficult to generalize the previous examples to R^d.

South-West Intervals in R^d

A = {(-¥,a₁]×(-¥,a₂]¼×(-¥,a_d] : a_j Î R } . By what we found for the case d=1 and property 5 above, we can deduce that, S (n,A ) £ (n+1)^d and looking ahead we guess V = d. This can be shown by noticing that A shatters the d cannonical vectors E={e₁,¼,e_d} with e_j=(0,¼,0,1,0,¼,0) (i.e. all zeroes except a 1 in the jth entry). Any B Ì E is picked by a member of A , e.g., by the one with a_j = 1+e_j/2 where e_j = +1 if e_j Î B and e_j = -1 otherwise. Furthermore, no set with d+1 points can be shatter by A since it is imposible to pick only the d points where the first has the largest first coordinate, the second has the largest second coordinate, ¼, the dth has the largest dth coordinate, and not the other. Thus, V=d. ·

All the rectangles in R^d

For this class V = 2d. Again the case d=1 and property 5 above shows that,

S (n,A ) £

æ
è

n(n+1)

ö
ø

< (n+1)^2d

and looking ahead we'd guess V = 2d. A proof along the lines of the previous example is straight forward.

Power Tools

The following simple result is known as Sauer's Lemma. It provides a way to bound shatter coefficients in terms of the VC dimension.

Sauer's Lemma

If V(A ) < ¥, for all n

S (n,A ) £

V
å
i=0

æ
è

n
i

ö
ø

Proof: By induction on n. It is true for n £ V since for these n,

S (n,A ) = 2ⁿ =

n
å
i=0

æ
è

n
i

ö
ø

V
å
i=0

æ
è

n
i

ö
ø

Now let us assume the result true for n and deduce it for n+1. Clearly S (n+1,A ) is at most 2S (n,A ) (see property 4 above) but this bound can be reduced by using the fact that the vc dimension is V so we know that A shatters no set with V+1 points. On the other hand, a set with n+1 points has n choose V more subsets with V+1 elements than a set with n points has, and each of these must hide at least one new subset that cannot be picked out by A . Hence, applying the induction hypothesis, and the formula for Pascal's triangle satisfied by the binomial coefficients, we obtain,

S (n+1,A )

2 S (n,A ) -

æ
è

n
V

ö
ø

V
å
i=0

æ
è

n
i

ö
ø

æ
è

n
V

ö
ø

V
å
i=0

æ
è

n
i

ö
ø

V
å
i=1

æ
è

n
i-1

ö
ø

V
å
i=0

æ
è

n+1
i

ö
ø

The promised bounds follow.

Corollary

If V < ¥, for all n,

S (n,A ) £ (n+1)^V,

and when n ³ V,

S (n,A ) £

æ
è

ö
ø

Proof: We have,

V
å
i=0

æ
è

n
i

ö
ø

V
å
i=0

nⁱ

V
å
i=0

nⁱV!

i!(V-i)!

= (n+1)^V

Also, if V/n £ 1,

æ
è

ö
ø

V
å
i=0

æ
è

n
i

ö
ø

V
å
i=0

æ
è

ö
ø

æ
è

n
i

ö
ø

n
å
i=0

æ
è

ö
ø

æ
è

n
i

ö
ø

æ
è

1 +

ö
ø

£ e^V. ·

The other useful tool is,

Theorem

If G is an m-dimensional vector space of real-valued functions defined on R^d then the class of sets,

A = { {x : g(x) ³ 0}: g Î G }

has VC dimension V £ m.

Proof: Take a set of m+1 arbitrary points {x₁,¼,x_m+1} in R^d. We show one subset that can never be picked out by A . To define this special subset consider the linear transformation L from G to R^m+1 defined by,

L(g) = (g(x₁),¼,g(x_m+1)).

Since the image space LG is of dimension at most m, there must exist a non-zero vector g Î R^m+1 orthognal to the image space LG . Thus, there exist scalars g₁,¼,g_m+1 not all zero such that,

m+1
å
i=1

g_ig(x_i) = 0 for each g Î G .

Separate the sum above according to whether g_i is negative (i Î {-}) or not (i Î {+}), to obtain,

å
{+}

g_i g(x_i) =

å
{-}

(-g_i) g(x_i).

By replacing g by -g if necessary, we may assume (wlog since g ¹ 0) that {-} is non-empty and this shows that it is imposible for A to only pick out the x_i with i Î {+}. For if there was a g Î G with {x:g(x) ³ 0} performing the trick, it would have to make the lhs of the above equation ³ 0 but the rhs < 0 since {-} is non-empty and g(x_i) < 0 for all i Î {-}.·

File translated from T_EX by T_TH, version 3.63.
On 5 Oct 2004, 10:13.

https://omega0.xyz/omega8008/

VC Dimension: Examples and Tools

Carlos C. Rodríguez https://omega0.xyz/omega8008/

Recall the Definitions

Simplest Examples

More Examples

Power Tools

Carlos C. Rodríguez
https://omega0.xyz/omega8008/