I just started a course on spatial statistics, so I’ve got covariance functions and variograms on the mind. This post is mostly for me to work through their intuition and relationship.
Say you have some spatio-temporal process, with specific locations denoted $s_1, s_2, \dots$, with the value of the process those points are $z(s_1), z(s_2), \dots$. For concreteness, these locations could be latitude and longitude and the field could be the outdoor temperature. Or maybe the locations are the the space-time $(x,y,t)$ of a player on a basketball court and the field is her shot percentage or scoring efficiency from that point.
Either way, one basic intuition is that values from two close points in a field will be more ‘similar’ than two distant points. This is a loaded assumption. Even temperature values are influenced by weather patterns and other exogenous variables can create unexplained variance. But we’ll go with it for now (and there are ways of accounting for these latent influences).
One way to encode this assumption is a covariance function, which simply takes in two points $s_1, s_2$ and spits out a covariance value. Given a covariance function $K(s, s’)$, the random variables $Z(s_1), Z(s_2), \dots, Z(s_n)$ will have the covariance structure given by
K(s_1, s_1) & \dots && K(s_1, s_n) \\
\vdots & \dots && \vdots \\
K(s_n, s_1) & \dots && K(s_n, s_n)
Note that this function is restricted to only create positive semidefinite matrices. One commonly used covariance function is
K(s_1, s_2) = \sigma^2 \exp(-|s_1 – s_2|)
and another may square the difference.
One thing to notice is that this particular covariance function relies only on the distance between $s_1$ and $s_2$, (denoted by $h$), meaning it takes the form
K(h) = K(s+h, s)
When the covariance structure of a field only relies on this distance between points, and we assume that there is some mean value $E(Z(s)) = \mu$, then the process is second order (or weakly) stationary (and this particular case the covariance only relies on the magnitude of the distance, not the direction, making it isotropic).
Intuitively, this covariance matrix encodes that we expect the covariance in temperature between Boston and New York to be much higher than the covariance between Seattle and Miami.
But you can also consider the variation of the difference in temperature between a pair of cities. This is the intuition behind a variogram. The variogram function describes how we expect the value of the field to vary given the two positions. It is formally defined
2 \gamma(s_1, s_2) = var( Z(s_1) – Z(s_2) )
Also note that this function needs to spit out positive values for all locations $s_i, s_j$.
When the variance between two locations in the process relies only on the distance (and again, we have some mean value) then the process is said to be intrinsically stationary.
2 \gamma(h) = 2 \gamma(Z(s+h) – Z(s))
And it turns out that the class of second order stationary processes is a subclass of the broader class of intrinsically stationary processes.
We can also relate the concepts of second order stationery covariance functions to intrinsically stationary variograms.
2 \gamma(h) &= var( Z(s+h) – Z(s) ) \\
&= E[ Z(s+h)^2 ] + E[ Z(s)^2 ] – 2 E[ Z(s+h) Z(s) ] \\
&= var(Z(s+h)) + \mu^2 + var(Z(s)) + \mu^2 – 2 (cov(Z(s+h), Z(s)) + 2\mu^2) \\
&= 2K(0) – 2K(h) \\
\implies \gamma(h) &= K(0) – K(h)
So there you have it – two ways of modeling spatial statistical dependency and how they relate. Noel Cressie and Christopher K. Wikle. Statistics for Spatio-Temporal Data (Chapter 4). Wiley , 2011.
 Luke Bornn, Gavin Shaddick, and James V. Zidek. Modeling Nonstationary Processes Through Dimension Expansion. Journal of the American Statistical Association, March 2012, Vol. 107, No. 497, Theory and Methods.