Lecture IV

An Introduction to Markov Chain Monte Carlo

Lecture IV

Abstract

The Rejection and Acceptance Complement Methods.

Exact Sampling Methods

The Metropolis algorithm and its variants are great, for they can be used to generate ergodic Markov chains with, in principle, any pre specified stationary distribution. Just choose an arbitrary starting point and eventually the chain will begin sampling from its stationary distribution. The problem is, of course, that (usually) we can't be 100% sure about when this will happen. Exact sampling methods, when available, are therefore more reliable and, often more efficient than MCMC methods. For this reason MCMC sampling is almost always used in combination with exact methods. This is specially true for the Gibbs sampler where we need to generate from the full conditionals. What exact method to use depends on the particular problem. There are literally hundreds of methods available. The current bible is Devroye's book which is unfortunately not available online. Besides the only truly universal exact generator (the inverse cdf method introduced in lecture I) the most flexible exact algorithm is the rejection method.

The Rejection Method

The idea, I think first implemented by Von Neumann, is very simple. To sample from f(x) find another simpler density g(x) from which you know how to sample and such that c g(x) ³ f(x) for some c > 1 (we say that c g envelopes f). Like in the following picture:

Then generate uniformly under the graph of the envelope and accept only samples that fall under the graph of f. In other words reject a sample if it falls in between the envelope and the function f.

Copying from the bible:

Theorem 1 If f and g are densities on R^p with, f(x) £ c g(x) for all x Î R^p, some c ³ 1. Then the following algorithm

The Rejection Method

{ REPEAT { x ¬ sample from g u ¬ unif(0,1) y ¬ c [g(x)/f(x)] }
UNTIL uy £ 1 RETURN x
}

will produce a sample X = x with density f(x).

Proof: Consider the following two lemmas

Lemma 1 Let X be a random p-vector with density f(x). Let U be a uniform(0,1) r.v. independent of X, and let c > 0. Then,

I.: (X,c U f(X)) is uniform on, A = {(x,u) : x Î R^p, 0 £ u £ c f(x) }
II.: If (X,U) is uniform on A, then X has density f(x).

Proof:

I.

Let B Ì A, then

P[(X,c U f(X)) Î B ]

=

ó
õ P[(x,c U f(x)) Î B | X = x ] f(x) dx

=

ó
õ ì
í
î ó
õ

B_x = {u:(x,u) Î B}
1
c f(x)
du ü
ý
þ f(x) dx

=

1
c
ó
õ

B
du dx

where the second equality follows from Tonelli's theorem, the independence of X and U and the fact that cf(x)U is uniform(0,cf(x)). But,

|A| = ó
õ

A
du dx = ó
õ

R^d
( ó
õ cf(x)

0
du)dx = c

thus, for any B Ì A measurable we have,

P( (X,cUf(X)) Î B ) = |B|
|A|

which means that (X,cUf(X)) is uniform on A.

II.

We just need to show that, "B measurable, P[X Î B] = ò_B f(x) dx. But,

P[X Î B]

=

P[ X Î B, 0 £ U £ c f(X) ]

=

P[(X,U) Î B₁ = {(x,u):x Î B, 0 £ u £ cf(x)}]

=

ó
õ ó
õ

B₁
du dx
ó
õ ó
õ

A
du dx

=

1
c
ó
õ

B
c f(x) dx = ó
õ

B
f(x) dx.

Lemma 2 Let X₁,X₂,¼ be iid vectors in R^p. Let A Ì R^p s.t. P[X Î A] = a > 0. Let Y be the first X_i Î A. Then, "B Ì R^p measurable,

P[Y Î B] =
P[X₁ Î A Ç
B]
a
.

Moreover, if X₁ is uniform on A₀ É A then Y is uniform on A.

Proof:

P[Y Î B]

¥
å
i = 1

P[X₁\not Î A,¼, X_i-1\not Î A,X_i Î B

A ]

¥
å
i = 1

(i-a)^i-1 P[X₁ Î B

P[X₁ Î B

1-(1-a)

Also, if X₁ is uniform on A₀ É A then for all measurable B,

P[Y Î B]

P[X₁ Î ABA₀)

|ABA₀| / |A₀|

|A₀A| / |A₀|

|AB|

|A|

therefore, Y is uniform on A.

We are now ready for the Theorem.

Proof: (that the Rejection Method is valid). In the first line the algorithm generates (by Lemma 1, I) (X,cUg(X)) uniform on C = {(x,u): x Î R^p, 0 £ u £ c g(x) }. However at exit time (by Lemma 2), (X,cUg(X)) is uniform on C = {(x,u): x Î R^p, 0 £ u £ f(x) }. Notice that, by the assumption that f(x) £ cg(x) (i.e. that g envelopes f), C É A. Thus, (by Lemma 1, II) X has density f(x).

Best g is f itself

Let N be the number of pairs (x,u) generated by the rejection algorithm to exit. We have,

P[ N ³ i ]

P[ reject at least (i-1) pairs ]

P[ UY ³ 1 ]^i-1 = (1-a)^i-1

where,

P[ U c

g(X)

f(X)

£ 1 ] =

ó
õ

P[Ucg(x) £ f(x)] g(x)] dx

ó
õ

f(x)

c g(x)

g(x) dx =

The expected number of pairs generated by the algorithm is, < N > where,

< N > =

¥
å
i = 1

P[N ³ i] =

¥
å
i = 1

(1-a)^i-1 =

1-(1-a)

= c

Thus, c is the expected number of rejections and therefore it should be kept as small as possible, i.e. as close as possible to its minimum value of 1. Since, f(x) £ c g(x), in order for c to be small g must be close to f.

Example: To sample from the standard Gaussian we can use the rejection method with g as the Laplace distribution. Notice that,

f(x) = (2p)^-1/2 exp(-x²/2)

and in order to get an upper bound for f we need a lower bound for the energy, i.e. for x²/2. But that follows easily from,

(|x| - 1)² =

x²

- |x| ³ 0

from where we get,

exp(-x²/2) £ exp(

- |x|) = (2p)^1/2

æ
ç
è

ö
÷
ø

1/2

æ
ç
è

exp(-|x|)

ö
÷
ø

the last term in parenthesis above, is the density of the Laplace distribution which the above inequality shows to envelope the N(0,1) with,

c =

æ
ç
è

ö
÷
ø

1/2

» 1.3155

Javascript Demo Implementation of the above in Javascript. Solution by Ke Yang

There are several variations of the rejection method in the bible. One fairly general method that never rejects any samples is:

The Acceptance Complement Method

Suppose that we don't know how to envelope f but,

f(x) = f₁(x) + f₂(x)

with, f₁(x) ³ 0, f₂(x) ³ 0 and f₁(x) ³ g(x) where g is a density. Furthermore suppose that we know how to sample from g and from f₂ (properly normalized), then,

The Acceptance Complement Method

{
x ¬ sample from g u ¬ unif(0,1) IF u > f₁(x) / g(x) THEN x ¬ sample from f₂/òf₂ RETURN x

}

Theorem 2 X = x is a sample from f.

Proof: Let a = òf₂ and suppose that Y has density g. We have,

P[X Î B]

P[Y Î B, U £

f₁(Y)

g(Y)

] +P[U >

f₁(Y)

g(Y)

]

ó
õ

f₂(x) dx / a

ó
õ

f₁(x)

g(x)

g(x) dx +

æ
ç
è

1 -

ó
õ

f₁(y)

g(y)

g(y) dy

ö
÷
ø

ó
õ

f₂/a

ó
õ

f₁ +

ó
õ

f₂ =

ó
õ

Example: f₁(x) = min{f(x),g(x)} and f₂(x) = (f(x)-g(x))₊. Then, clearly f₁(x) £ g(x) and f(x) = f₁(x) + f₂(x). So if we know how to sample from g and f₂, we are done.

Another useful general split is available for almost flat densities on [-1,1], i.e. densities f(x) such that,

sup

f(x) -

inf

f(x) £

then we may take,

g(x)

1/2 for |x| £ 1

f₁(x)

f(x) - (M -

), with M =

sup

f(x)

f₂(x)

M -

for |x| £ 1

which works since g and f₂ are proportional to densities uniform on [-1,1] (so easy to sample from). That f₁ is bounded above by g just follows from the almost flat condition and that f₁ > 0 follows from the fact that,

0 £

inf

f(x) £

sup

f(x) £ 1

Hence, the method can be used to generate from many densities symmetric about with a single mode at 0, for which M = f(0) and inf f = f(1). For example we can generate from the truncated Cauchy for |x| £ 1. We have,

f(x) =

p(1+x²)

for |x| £ 1

which is almost flat since 2/p- 1/p = 1/p » 0.318 < 0.5. Using the property that 1/X is Cauchy when X is Cauchy we can generate a complete Cauchy by just using the Acceptance Complement Method to generate from the truncated Cauchy and then with probability 1/2 return 1/X instead of X.

Example: Javascript implementation of the Cauchy with the above method.

A Trivial Perfect MCMC Method

If there is available a good envelope for f then the rejection method is preferable to asymptotic methods based on Markov chains. The problem is that often good envelopes are not easy to find. A bad envelope can be easier to find and still be useful when combined with a Markov chain method. For example it may be possible to show that f(x) £ 10000 g(x) which means that we can use the rejection method but expect to generate, on the average, 10000 (X,U) pairs before accepting one vector X. In this case we can still use the rejection method to generate a single observation of X which is warranted to have the correct density f. Then, use the observation generated by the rejection method as the initial point for a Markov chain method with statitionary distribution f and harvest the complete path of the chain that is now sampling from its asymptotic distribution.

File translated from T_EX by T_TH, version 2.32.
On 5 Jul 1999, 22:59.