The Central Limit Theorem

Robert NishiharaBlog, Probability, Statistics

The proof and intuition presented here come from this excellent writeup by Yuval Filmus, which in turn draws upon ideas in this book by Fumio Hiai and Denes Petz. Suppose that we have a sequence of real-valued random variables . Define the random variable (1)   to be a scaled sum of the first variables in the sequence. Now, we would like to make interesting statements about the sequence (2)  

Optimal Spatial Prediction with Kriging

Robert NishiharaBlog, Machine Learning, Statistics

Suppose we are modeling a spatial process (for instance, the amount of rainfall around the world, the distribution of natural resources, or the population density of an endangered species). We’ve measured the latent function at some locations , and we’d like to predict the function’s value at some new location . Kriging is a technique for extrapolating our measurements to arbitrary locations. For an in-depth discussion, see Cressie and Wikle (2011). Here I derive Kriging in a simplified case. I will assume that is an intrinsically stationary process. In other words, there exists some semivariogram such that     Furthermore, I will assume that the process is isotropic, (i.e. that is a function only of ). As Andy described here, the existence … Read More

Pseudo-marginal MCMC

Robert NishiharaComputation, Probability, Statistics

This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function , with , and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state to a proposed state with probability . But what if we cannot evaluate exactly? Such a situation might arise if we are given a joint density function , with , and we must marginalize out in order to compute . In this situation, we may only be able to approximate    

An Auxiliary Variable Trick for MCMC

Robert NishiharaMachine Learning, Probability, Recent work1 Comment

I recently uploaded the paper “Parallel MCMC with Generalized Elliptical Slice Sampling” to the arXiv. I’d like to highlight one trick that we used, but first I’ll give some background. Markov chain Monte Carlo (MCMC) is a class of algorithms for generating samples from a specified probability distribution (in the continuous setting, the distribution is generally specified by its density function). Elliptical slice sampling is an MCMC algorithm that can be used to sample distributions of the form (1)   where is a multivariate Gaussian prior with mean and covariance matrix , and is a likelihood function. Suppose we want to generalize this algorithm to sample from arbitrary continuous probability distributions. We could simply factor the distribution as (2)  

Exponential Families and Maximum Entropy

Robert NishiharaProbability, StatisticsLeave a Comment

An exponential family parametrized by is the set of probability distributions that can be expressed as     for given functions (the partition function), , and (the vector of sufficient statistics). Exponential families can be discrete or continuous, and examples include Gaussian distributions, Poisson distributions, and gamma distributions. Exponential families have a number of desirable properties. For instance, they have conjugate priors and they can summarize arbitrary amounts of data using a fixed-size vector of sufficient statistics. But in addition to their convenience, their use is theoretically justified.

Dealing with Reliability when Crowdsourcing

Robert NishiharaMachine Learning, StatisticsLeave a Comment

I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy. To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are tasks and workers, and assume that the possible labels are . Define the matrix so that is the label given to task by worker (or … Read More

A Continuous Approach to Discrete MCMC

Robert NishiharaMachine Learning, StatisticsLeave a Comment

Continuous problems are often simpler to solve than discrete problems. This is true in many optimization problems (for instance, linear programming versus integer linear programming). In the case of Markov chain Monte Carlo (MCMC), sampling continuous distributions has some advantages over sampling discrete distributions due to the availability of gradient information in the continuous case. The paper “Continuous Relaxations for Discrete Hamiltonian Monte Carlo” by Yichuan Zhang, Charles Sutton, Amos Storkey, and Zoubin Ghahramani explores the idea of performing inference in the discrete setting by deriving and sampling a related continuous distribution. Here I describe the approach taken in this paper.

Learning Image Features from Video

Robert NishiharaMachine LearningLeave a Comment

While at NIPS, I came across the paper Deep Learning of Invariant Features via Simulated Fixations in Video by Will Zou, Shenghuo Zhu, Andrew Ng,  and Kai Yu. It proposes a particularly appealing unsupervised method for using videos to learn image features. Their method appears to be somewhat inspired by the human visual system. For instance, people have access to video data, not static images. They also attempt to mimic the human tendency to fixate on particular objects. They track objects through successive frames in order to provide more coherent data to the learning algorithm. The authors use a stacked architecture, where each layer is trained by optimizing an embedding into a feature space. As usual, the optimization problem involves a reconstruction … Read More