Optimal Spatial Prediction with Kriging

Robert Nishihara Machine Learning, Statistics

[latexpage]Suppose we are modeling a spatial process (for instance, the amount of rainfall around the world, the distribution of natural resources, or the population density of an endangered species). We’ve measured the latent function $Z$ at some locations ${\bf s}_1, \ldots, {\bf s}_N$, and we’d like to predict the function’s value at some new location ${\bf s}_0$. Kriging is a technique for extrapolating our measurements to arbitrary locations. For an in-depth discussion, see Cressie and Wikle (2011). Here I derive Kriging in a simplified case.

I will assume that $Z$ is an intrinsically stationary process. In other words, there exists some semivariogram $\gamma({\bf h})$ such that

\[ \text{var}[Z({\bf s}+{\bf h}) – Z({\bf s})] = 2\gamma({\bf h}) . \]

Furthermore, I will assume that the process is isotropic, (i.e. that $\gamma({\bf h})$ is a function only of $||h||$). As Andy described here, the existence of a covariance function implies intrinsic stationarity. In addition, I will assume that the process has a constant mean, $\mathbb E[Z({\bf s})] = \mu$. We would like to estimate $Z({\bf s})$ with a linear combination of our current observations. Our estimator will be

\[ \hat{Z} = \sum_{n=1}^N \lambda_n Z({\bf s}_n) ,\]

where the weights $\lambda_n$ can be positive or negative. We further require that $\sum_n \lambda_n = 1$ so that our estimate is unbiased. We would like to choose the weights $\lambda_n$ so as to minimize the mean-squared predictive error

\[ \text{MSPE}(\lambda_1, \ldots, \lambda_N) = \mathbb E[(\hat{Z} – Z({\bf s}) )^2]  .\]

Let $\gamma_{nm}$ denote $\gamma({\bf s}_n – {\bf s}_m)$. Expanding the expression for the mean-squared predictive error, we get

\[  \sum_{n,m} \lambda_n \lambda_m \mathbb E[Z({\bf s}_n)Z({\bf s}_m)] + \mathbb E[Z({\bf s}_0)^2]  – 2\sum_n \lambda_n \mathbb E[Z({\bf s}_n) Z({\bf s}_0)] . \]

Adding and subtracting $\sum_n \lambda_n \mathbb E[Z({\bf s}_n)^2]$, this expression breaks into $A+B$, where


A & = & -\sum_n \lambda_n \mathbb E[Z({\bf s}_n)^2] + \sum_{n,m} \lambda_n\lambda_m \mathbb E[Z({\bf s}_n)Z({\bf s}_m)] \\

& = & -\frac12 \sum_{n,m} \lambda_n\lambda_m \mathbb E[(Z({\bf s}_n) – Z({\bf s}_m))^2] \\

& = & -\sum_{n,m} \lambda_n\lambda_m \gamma_{nm} ,




B & = & \mathbb E[Z({\bf s}_0)^2] – 2\sum_n \lambda_n \mathbb E[Z({\bf s}_n)Z({\bf s}_0)] + \sum_n \lambda_n \mathbb E[Z({\bf s}_n)^2] \\

& = & \sum_n \lambda_n \mathbb E[(Z({\bf s}_0) – Z({\bf s}_n))^2] \\

& = & 2 \sum_n \lambda_n \gamma_{0n}.


We minimize the quantity $A+B$ subject to the constraint $\sum_n \lambda_n = 1$ using Lagrange multipliers. To simplify the notation, define the matrix $\Gamma$ by $\Gamma_{nm} = \gamma_{nm}$, the vector $\boldsymbol\lambda = (\lambda_1, \ldots, \lambda_N)^{\mathsf T}$, the vector $\boldsymbol\gamma_0 = (\gamma_{01}, \ldots, \gamma_{0N})^{\mathsf T}$, and the vector ${\bf 1} = (1, \ldots, 1)^{\mathsf T}$. Then the mean-squared predictive error is given by

\[ -\boldsymbol\lambda^{\mathsf T} \Gamma \boldsymbol\lambda + 2\boldsymbol\lambda^{\mathsf T} \boldsymbol\gamma_0 .\]

Incorporating the Lagrange multiplier constraint, we have the quantity

\[ \Phi(\boldsymbol\lambda, \alpha) =  -\boldsymbol\lambda^{\mathsf T} \Gamma \boldsymbol\lambda + 2\boldsymbol\lambda^{\mathsf T} \boldsymbol\gamma_0 – \alpha({\bf 1}^{\mathsf T}\boldsymbol\lambda – 1) .\]

Differentiating $\Phi$ with respect to $\alpha$ gives back our constraint. Differentiating with respect to each $\lambda_n$ and concatenating the resulting equations into matrix form gives


& & -2\Gamma\boldsymbol\lambda + 2\boldsymbol\gamma_0 = \alpha {\bf 1} \\

& \Leftrightarrow & \boldsymbol\lambda = \Gamma^{-1}(\boldsymbol\gamma_0 – \alpha {\bf 1}/2) .


Incorporating the constraint gives


1 & = & {\bf 1}^{\mathsf T} \boldsymbol\lambda \\

& = & {\bf 1}^{\mathsf T} \Gamma^{-1}(\boldsymbol\gamma_0 – \alpha {\bf 1}/2) .


Solving for $\alpha$ and plugging this back into our formula for $\boldsymbol\lambda$, we find that

\[ \boldsymbol\lambda = \Gamma^{-1}\left( \boldsymbol\gamma_0 – \frac{{\bf 1}^{\mathsf T}\Gamma^{-1}\boldsymbol\gamma_0 – 1}{{\bf 1}^{\mathsf T}\Gamma^{-1}{\bf 1}}{\bf 1} \right) .\]

This gives us our optimal Kriging predictor.