This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function , with , and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state to a proposed state with probability .

But what if we cannot evaluate exactly? Such a situation might arise if we are given a joint density function , with , and we must marginalize out in order to compute . In this situation, we may only be able to approximate

for instance with importance sampling. If we draw i.i.d. variables from the distribution the density function , then our importance sampling estimate will be

What happens if we go ahead and run Metropolis-Hastings on the estimated density function ? If we generate new variables once for each proposal, then we can view this as performing a random walk through . To be clear, if we are at the current state , we propose the state by drawing from the original spherical proposal distribution centered on and by drawing each from . We then accept the proposed state with probability

Since the ratio of transition probabilities is given by

we can view this approximate Metropolis-Hastings algorithm on as an exact Metropolis-Hastings algorithm on with stationary distribution given by

For each , define the function by

This is essentially the random variable . It follows that

But

(1)

Indeed,

The important part is that is independent of . Therefore, equation (1) tells us that the stationary distribution of our Markov chain has the desired marginal distribution. In other words, we can run Metropolis-Hastings on the approximation of the true density function and still get the correct outcome.