Variational Inference (part 1)

Andy MillerMachine Learning, Probability, Statistics, Uncategorized

I will dedicate the next few posts to variational inference methods as a way to organize my own understanding – this first one will be pretty basic. The goal of variational inference is to approximate an intractable probability distribution, , with a tractable one, , in a way that makes them as ‘close’ as possible. Let’s unpack that statement a bit.

Variograms, Covariance functions and Stationarity

Andy MillerStatisticsLeave a Comment

I just started a course on spatial statistics, so I’ve got covariance functions and variograms on the mind. This post is mostly for me to work through their intuition and relationship. Say you have some spatio-temporal process, with specific locations denoted , with the value of the process those points are . For concreteness, these locations could be latitude and longitude and the field could be the outdoor temperature. Or maybe the locations are the the space-time of a player on a basketball court and the field is her shot percentage or scoring efficiency from that point.

DPMs and Consistency

Andy MillerMachine Learning, StatisticsLeave a Comment

Jeff Miller and Matthew Harrison at Brown (go Bears!) have recently explored the posterior consistency of Dirichlet process mixture (DPM) models, emphasizing one particular drawback. For setup, say you have some observed data from a mixture of two normals, such as     In this case, the number of clusters, , is two, and one would imagine that as grows, the posterior distribution of would converge to 2, i.e. . However, this is not true if you model the data with a DPM (or more generally, modeling the mixing measure as a Dirichlet process, ).

Nonparanormal Activity

Andy MillerMachine LearningLeave a Comment

Say you have a set of -dimensional iid samples drawn from some unknown continuous distribution that you want to estimate with an undirected graphical model. You can sometimes get away with assuming the ‘s are drawn from a multivariate normal (MVN), and from there you can use a host of methods for estimating the covariance matrix , and thus the graph structure (perhaps imposing sparsity constraints for inferring structure in high dimensional data, ). In other cases the Gaussian assumption is too restrictive (e.g. when marginals exhibit multimodal behavior). One way to augment the expressivity of the MVN while maintaining some of the desirable properties is to assume that some function of the data is MVN.