This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function , with , and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state to a proposed state with probability .
But what if we cannot evaluate exactly? Such a situation might arise if we are given a joint density function , with , and we must marginalize out in order to compute . In this situation, we may only be able to approximate
for instance with importance sampling. If we draw i.i.d. variables from the distribution the density function , then our importance sampling estimate will be
What happens if we go ahead and run Metropolis-Hastings on the estimated density function ? If we generate new variables once for each proposal, then we can view this as performing a random walk through . To be clear, if we are at the current state , we propose the state by drawing from the original spherical proposal distribution centered on and by drawing each from . We then accept the proposed state with probability
Since the ratio of transition probabilities is given by
we can view this approximate Metropolis-Hastings algorithm on as an exact Metropolis-Hastings algorithm on with stationary distribution given by
For each , define the function by
This is essentially the random variable . It follows that
The important part is that is independent of . Therefore, equation (1) tells us that the stationary distribution of our Markov chain has the desired marginal distribution. In other words, we can run Metropolis-Hastings on the approximation of the true density function and still get the correct outcome.