# Pseudo-marginal MCMC

Robert Nishihara

[latexpage]This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function $\pi({\bf x})$, with ${\bf x} \in \mathcal X$, and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state ${\bf x}$ to a proposed state ${\bf x}’$ with probability $\min(1, \pi({\bf x}’)/\pi({\bf x}))$ .

But what if we cannot evaluate $\pi({\bf x})$ exactly? Such a situation might arise if we are given a joint density function $\pi({\bf x}, {\bf z})$, with ${\bf z} \in \mathcal Z$, and we must marginalize out ${\bf z}$ in order to compute $\pi({\bf x})$. In this situation, we may only be able to approximate

$\pi({\bf x}) = \int \pi({\bf x},{\bf z}) \, \mathrm d{\bf z} ,$

for instance with importance sampling. If we draw i.i.d. variables $\{{\bf z}_1, \ldots, {\bf z}_N\}$ from the distribution the density function $q({\bf z})$, then our importance sampling estimate will be

$\tilde{\pi}({\bf x}) = \frac{1}{N} \sum_{n=1}^{N} \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)} .$

What happens if we go ahead and run Metropolis-Hastings on the estimated density function $\tilde{\pi}({\bf x})$? If we generate new variables $\{{\bf z}_1, \ldots, {\bf z}_N\}$ once for each proposal, then we can view this as performing a random walk through $\mathcal X \times \mathcal Z^N$. To be clear, if we are at the current state $({\bf x},{\bf Z}) = ({\bf x}, {\bf z}_1, \ldots, {\bf z}_N)$, we propose the state $({\bf x}’,{\bf Z}’) = ({\bf x’}, {\bf z}_1′, \ldots, {\bf z}_N’)$ by drawing ${\bf x’}$ from the original spherical proposal distribution centered on ${\bf x}$ and by drawing each ${\bf z}_n’$ from $q({\bf z})$. We then accept the proposed state with probability

\begin{eqnarray*}

\frac{\sum_n \frac{\pi({\bf x}’, {\bf z}_n’)}{q({\bf z}_n’)}}{\sum_n \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)}} & = & \frac{\left(\sum_n \frac{\pi({\bf x}’, {\bf z}_n’) }{q({\bf z}_n’)}\right)\prod_n q({\bf z}_n’)}{\left(\sum_n \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)}\right)\prod_n q({\bf z}_n)} \cdot \frac{\prod_n q({\bf z}_n)}{\prod_n q({\bf z}_n’)}

\end{eqnarray*}

Since the ratio of transition probabilities is given by

$\frac{P[({\bf x}’,{\bf Z}’) \to ({\bf x},{\bf Z})]}{P[({\bf x},{\bf Z}) \to ({\bf x}’,{\bf Z}’)]} = \frac{\prod_n q({\bf z}_n)}{\prod_n q({\bf z}_n’)} ,$

we can view this approximate Metropolis-Hastings algorithm on $\mathcal X$ as an exact Metropolis-Hastings algorithm on $\mathcal X \times \mathcal Z^N$ with stationary distribution given by

$p({\bf x}, {\bf Z}) \propto \left( \sum_n \frac{\pi({\bf x}, {\bf z}_n) }{q({\bf z}_n)}\right) \prod_n q({\bf z}_n) .$

For each $x \in \mathcal X$, define the function $w_x \colon \mathcal Z^N \to \mathbb R$ by

$w_{{\bf x}}({\bf z}_1, \ldots, {\bf z}_N) = \frac{\frac{1}{N}\sum_n \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)}}{\pi({\bf x})} .$

This is essentially the random variable $\tilde{\pi}({\bf x})/\pi({\bf x})$. It follows that

$p({\bf x}, {\bf Z}) \propto \pi({\bf x}) \, w_{{\bf x}}({\bf z}_1, \ldots, {\bf z}_N) \, \prod_n q({\bf z}_n) .$

But

\begin{equation}

\int p({\bf x}, {\bf Z}) \, \mathrm d{\bf Z} \propto \pi({\bf x}) \, \mathbb E[w_{{\bf x}}]  = \pi({\bf x}) .

\end{equation}

Indeed,

\begin{eqnarray*}

\mathbb E[w_{{\bf x}}] & = & \frac{1}{N} \sum_{n = 1}^N \int \frac{\pi({\bf x}, {\bf z}_n)}{\pi({\bf x}) \, q({\bf z}_n)} \prod_{m=1}^N q({\bf z}_m) \, \mathrm d{\bf Z} \\

& = & \frac{1}{N} \sum_{n = 1}^N \int \pi({\bf z}_n \,|\, {\bf x})  \, \mathrm d{\bf z}_n \\

& = & 1.

\end{eqnarray*}

The important part is that $\mathbb E[w_{{\bf x}}]$ is independent of ${\bf x}$. Therefore, equation (1) tells us that the stationary distribution of our Markov chain has the desired marginal distribution. In other words, we can run Metropolis-Hastings on the approximation $\tilde{\pi}({\bf x})$ of the true density function and still get the correct outcome.