Optimal Spatial Prediction with Kriging

Robert NishiharaBlog, Machine Learning, Statistics

Suppose we are modeling a spatial process (for instance, the amount of rainfall around the world, the distribution of natural resources, or the population density of an endangered species). We’ve measured the latent function Z at some locations {\bf s}_1, \ldots, {\bf s}_N, and we’d like to predict the function’s value at some new location {\bf s}_0. Kriging is a technique for extrapolating our measurements to arbitrary locations. For an in-depth discussion, see Cressie and Wikle (2011). Here I derive Kriging in a simplified case.

I will assume that Z is an intrinsically stationary process. In other words, there exists some semivariogram \gamma({\bf h}) such that

    \[ \text{var}[Z({\bf s}+{\bf h}) - Z({\bf s})] = 2\gamma({\bf h}) . \]

Furthermore, I will assume that the process is isotropic, (i.e. that \gamma({\bf h}) is a function only of ||h||). As Andy described here, the existence of a covariance function implies intrinsic stationarity. In addition, I will assume that the process has a constant mean, \mathbb E[Z({\bf s})] = \mu. We would like to estimate Z({\bf s}) with a linear combination of our current observations. Our estimator will be

    \[ \hat{Z} = \sum_{n=1}^N \lambda_n Z({\bf s}_n) ,\]

where the weights \lambda_n can be positive or negative. We further require that \sum_n \lambda_n = 1 so that our estimate is unbiased. We would like to choose the weights \lambda_n so as to minimize the mean-squared predictive error

    \[ \text{MSPE}(\lambda_1, \ldots, \lambda_N) = \mathbb E[(\hat{Z} - Z({\bf s}) )^2]  .\]

Let \gamma_{nm} denote \gamma({\bf s}_n - {\bf s}_m). Expanding the expression for the mean-squared predictive error, we get

    \[  \sum_{n,m} \lambda_n \lambda_m \mathbb E[Z({\bf s}_n)Z({\bf s}_m)] + \mathbb E[Z({\bf s}_0)^2]  - 2\sum_n \lambda_n \mathbb E[Z({\bf s}_n) Z({\bf s}_0)] . \]

Adding and subtracting \sum_n \lambda_n \mathbb E[Z({\bf s}_n)^2], this expression breaks into A+B, where

    \begin{eqnarray*} A & = & -\sum_n \lambda_n \mathbb E[Z({\bf s}_n)^2] + \sum_{n,m} \lambda_n\lambda_m \mathbb E[Z({\bf s}_n)Z({\bf s}_m)] \\ & = & -\frac12 \sum_{n,m} \lambda_n\lambda_m \mathbb E[(Z({\bf s}_n) - Z({\bf s}_m))^2] \\ & = & -\sum_{n,m} \lambda_n\lambda_m \gamma_{nm} , \end{eqnarray*}

and

    \begin{eqnarray*} B & = & \mathbb E[Z({\bf s}_0)^2] - 2\sum_n \lambda_n \mathbb E[Z({\bf s}_n)Z({\bf s}_0)] + \sum_n \lambda_n \mathbb E[Z({\bf s}_n)^2] \\ & = & \sum_n \lambda_n \mathbb E[(Z({\bf s}_0) - Z({\bf s}_n))^2] \\ & = & 2 \sum_n \lambda_n \gamma_{0n}. \end{eqnarray*}

We minimize the quantity A+B subject to the constraint \sum_n \lambda_n = 1 using Lagrange multipliers. To simplify the notation, define the matrix \Gamma by \Gamma_{nm} = \gamma_{nm}, the vector \boldsymbol\lambda = (\lambda_1, \ldots, \lambda_N)^{\mathsf T}, the vector \boldsymbol\gamma_0 = (\gamma_{01}, \ldots, \gamma_{0N})^{\mathsf T}, and the vector {\bf 1} = (1, \ldots, 1)^{\mathsf T}. Then the mean-squared predictive error is given by

    \[ -\boldsymbol\lambda^{\mathsf T} \Gamma \boldsymbol\lambda + 2\boldsymbol\lambda^{\mathsf T} \boldsymbol\gamma_0 .\]

Incorporating the Lagrange multiplier constraint, we have the quantity

    \[ \Phi(\boldsymbol\lambda, \alpha) =  -\boldsymbol\lambda^{\mathsf T} \Gamma \boldsymbol\lambda + 2\boldsymbol\lambda^{\mathsf T} \boldsymbol\gamma_0 - \alpha({\bf 1}^{\mathsf T}\boldsymbol\lambda - 1) .\]

Differentiating \Phi with respect to \alpha gives back our constraint. Differentiating with respect to each \lambda_n and concatenating the resulting equations into matrix form gives

    \begin{eqnarray*} & & -2\Gamma\boldsymbol\lambda + 2\boldsymbol\gamma_0 = \alpha {\bf 1} \\ & \Leftrightarrow & \boldsymbol\lambda = \Gamma^{-1}(\boldsymbol\gamma_0 - \alpha {\bf 1}/2) . \end{eqnarray*}

Incorporating the constraint gives

    \begin{eqnarray*} 1 & = & {\bf 1}^{\mathsf T} \boldsymbol\lambda \\ & = & {\bf 1}^{\mathsf T} \Gamma^{-1}(\boldsymbol\gamma_0 - \alpha {\bf 1}/2) . \end{eqnarray*}

Solving for \alpha and plugging this back into our formula for \boldsymbol\lambda, we find that

    \[ \boldsymbol\lambda = \Gamma^{-1}\left( \boldsymbol\gamma_0 - \frac{{\bf 1}^{\mathsf T}\Gamma^{-1}\boldsymbol\gamma_0 - 1}{{\bf 1}^{\mathsf T}\Gamma^{-1}{\bf 1}}{\bf 1} \right) .\]

This gives us our optimal Kriging predictor.