Variational Inference (part 1)

Andy MillerMachine Learning, Probability, Statistics, Uncategorized

I will dedicate the next few posts to variational inference methods as a way to organize my own understanding – this first one will be pretty basic. The goal of variational inference is to approximate an intractable probability distribution, , with a tractable one, , in a way that makes them as ‘close’ as possible. Let’s unpack that statement a bit.

Geometric means of distributions

Roger GrosseMachine Learning

Annealed importance sampling [1] is a widely used algorithm for inference in probabilistic models, as well as computing partition functions. I’m not going to talk about AIS itself here, but rather one aspect of it: geometric means of probability distributions, and how they (mis-)behave.

Learning Theory: What Next?

Jonathan HugginsMachine Learning

In my previous post, I wrote about why I find learning theory to be a worthwhile endeavor. In this post I want to discuss a few open/under-explored areas in learning theory that I think are particularly interesting. If you know of progress in these areas, please email me or post in the comments.

An Auxiliary Variable Trick for MCMC

Robert NishiharaMachine Learning, Probability, Recent work1 Comment

I recently uploaded the paper “Parallel MCMC with Generalized Elliptical Slice Sampling” to the arXiv. I’d like to highlight one trick that we used, but first I’ll give some background. Markov chain Monte Carlo (MCMC) is a class of algorithms for generating samples from a specified probability distribution (in the continuous setting, the distribution is generally specified by its density function). Elliptical slice sampling is an MCMC algorithm that can be used to sample distributions of the form (1)   where is a multivariate Gaussian prior with mean and covariance matrix , and is a likelihood function. Suppose we want to generalize this algorithm to sample from arbitrary continuous probability distributions. We could simply factor the distribution as (2)  

What is representation learning?

Roger GrosseMachine Learning1 Comment

In my last post, I argued that a major distinction in machine learning is between predictive learning and representation learning. Now I’ll take a stab at summarizing what representation learning is about. Or, at least, what I think of as the first principal component of representation learning.

Learning Theory: Purely Theoretical?

Jonathan HugginsMachine Learning, MetaLeave a Comment

What’s learning theory good for, anyway? As I mentioned in my earlier blog post, not infrequently get into conversations with people in machine learning and related fields who don’t see the benefit of learning theory (that is, theory of learning). While that post offered one specific piece of evidence of how work seemingly only relevant in pure theory could lead to practical algorithms, I thought I would talk in more general terms why I see learning theory as a worthwhile endeavor. There are two main flavors of learning theory, statistical learning theory (StatLT) and computational learning (CompLT). StatLT originated with Vladimir Vapnik, while the canonical example of CompLT, PAC learning, was formulated by Leslie Valiant. StatLT, in line with its “statistical” descriptor, … Read More

Getting above the fray with lifted inference

Jonathan MalmaudComputation, Machine LearningLeave a Comment

Hi, I’m Jon. In my series of posts, I’ll be writing about how we can use the modern Bayesian toolkit to efficiently make decisions, solve problems, and formulate plans (the providence of AI), rather than restrict ourselves to approximating posteriors (the providence of statistics and much of machine learning). Here’s a simple example of how AI can help out machine learning. What was the first graphical model you were exposed to? There’s a good chance it was Pearl’s famous “Sprinkler, Rain, Wet grass” graphical model[1].