Pseudo-marginal MCMC
This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function , with
, and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state
to a proposed state
with probability
.
But what if we cannot evaluate exactly? Such a situation might arise if we are given a joint density function
, with
, and we must marginalize out
in order to compute
. In this situation, we may only be able to approximate
for instance with importance sampling. If we draw i.i.d. variables from the distribution the density function
, then our importance sampling estimate will be
What happens if we go ahead and run Metropolis-Hastings on the estimated density function ? If we generate new variables
once for each proposal, then we can view this as performing a random walk through
. To be clear, if we are at the current state
, we propose the state
by drawing
from the original spherical proposal distribution centered on
and by drawing each
from
. We then accept the proposed state with probability
Since the ratio of transition probabilities is given by
we can view this approximate Metropolis-Hastings algorithm on as an exact Metropolis-Hastings algorithm on
with stationary distribution given by
For each , define the function
by
This is essentially the random variable . It follows that
But
(1)
Indeed,
The important part is that is independent of
. Therefore, equation (1) tells us that the stationary distribution of our Markov chain has the desired marginal distribution. In other words, we can run Metropolis-Hastings on the approximation
of the true density function and still get the correct outcome.