An Auxiliary Variable Trick for MCMC

Robert NishiharaMachine Learning, Probability, Recent work1 Comment

I recently uploaded the paper “Parallel MCMC with Generalized Elliptical Slice Sampling” to the arXiv. I’d like to highlight one trick that we used, but first I’ll give some background. Markov chain Monte Carlo (MCMC) is a class of algorithms for generating samples from a specified probability distribution (in the continuous setting, the distribution is generally specified by its density function). Elliptical slice sampling is an MCMC algorithm that can be used to sample distributions of the form (1)   where is a multivariate Gaussian prior with mean and covariance matrix , and is a likelihood function. Suppose we want to generalize this algorithm to sample from arbitrary continuous probability distributions. We could simply factor the distribution as (2)  

The Correct Birth/Death Jacobian for Mixture Models

Nick FotiUncategorizedLeave a Comment

Reversible jump Markov Chain Monte Carlo (RJMCMC) [1] is an extension of the Metropolis-Hastings algorithm that allows sampling from a distribution over models with potentially different numbers of parameters. In this post we are interested in determining the number of components to use when modeling data with a mixture model. The number of components corresponds to the dimension of the space we are walking through. The point of this post is to clear up a common error seen in the literature involving computing a Jacobian that arises in the algorithm.

A Geometric Intuition for Markov’s Inequality

Peter KrafftProbabilityLeave a Comment

As the title of the post suggests, this week I will discuss a geometric intuition for Markov’s inequality, which for a nonnegative random variable, , states     This is a simple result in basic probability that still felt surprising every time I used it… until very recently. (Warning: Basic measure theoretic probability lies ahead. These notes look like they provide sufficient background if this post is confusing and you are sufficiently motivated!)

The Alias Method: Efficient Sampling with Many Discrete Outcomes

Ryan AdamsComputation, Statistics1 Comment

When implementing algorithms for inference and learning with probabilistic models, it commonly comes up that one needs to sample from a discrete distribution. That is, from a multinomial distribution with parameter , such that and . A somewhat more common occurrence is that we have a where , but we don’t know the normalization constant. That is, our is only proportional to the multinomial parameter . We want to rapidly generate a variate according to , given , something easily done with (Matlab) code such as this (paraphrased from Tom Minka‘s Lightspeed Toolbox): cdf = cumsum(phi); samp_k = sum(cdf < rand()*cdf(end)) + 1; This is nice and simple, but you'll notice that it has time complexity for setup (computing the ... Read More

What is representation learning?

Roger GrosseMachine Learning1 Comment

In my last post, I argued that a major distinction in machine learning is between predictive learning and representation learning. Now I’ll take a stab at summarizing what representation learning is about. Or, at least, what I think of as the first principal component of representation learning.

A Parallel Gamma Sampling Implementation

Scott LindermanComputation, StatisticsLeave a Comment

I don’t have a favorite distribution, but if I had to pick one, I’d say the gamma.  Why not the Gaussian? Because everyone loves the Gaussian! But when you want a prior distribution for the mean of your Poisson, or the variance of your Normal, who’s there to pick up the mess when the Gaussian lets you down? The gamma. When you’re trying to actually sample that Dirichlet that makes such a nice prior distribution for categorical distributions over your favorite distribution (how about that tongue twister), who’s there to help you?  You guessed it, the gamma. But if you want a distribution that you can sample millions of times during each iteration of your MCMC algorithm, well, now the … Read More

Exponential Families and Maximum Entropy

Robert NishiharaProbability, StatisticsLeave a Comment

An exponential family parametrized by is the set of probability distributions that can be expressed as     for given functions (the partition function), , and (the vector of sufficient statistics). Exponential families can be discrete or continuous, and examples include Gaussian distributions, Poisson distributions, and gamma distributions. Exponential families have a number of desirable properties. For instance, they have conjugate priors and they can summarize arbitrary amounts of data using a fixed-size vector of sufficient statistics. But in addition to their convenience, their use is theoretically justified.