Data compression and unsupervised learning, Part 2

[latexpage] This is a continuation of my last post about data compression and machine learning. In this post, I will start to address the question: Does “good” compression generally lead to “good” unsupervised learning? To answer this question, we need to start with another question: What is a “good” compression algorithm?

An Auxiliary Variable Trick for MCMC

Robert Nishihara 1 Comment

[latexpage]I recently uploaded the paper “Parallel MCMC with Generalized Elliptical Slice Sampling” to the arXiv. I’d like to highlight one trick that we used, but first I’ll give some background. Markov chain Monte Carlo (MCMC) is a class of algorithms for generating samples from a specified probability distribution $\pi({\bf x})$ (in the continuous setting, the distribution is generally specified by its density function). Elliptical slice sampling is an MCMC algorithm that can be used to sample distributions of the form \begin{equation} \pi({\bf x}) \propto \mathcal N({\bf x};\boldsymbol\mu,\boldsymbol\Sigma) L({\bf x}), \end{equation} where $\mathcal N({\bf x};\boldsymbol\mu,\boldsymbol\Sigma)$ is a multivariate Gaussian prior with mean $\boldsymbol\mu$ and covariance matrix $\boldsymbol\Sigma$, and $L({\bf x})$ is a likelihood function. Suppose we want to generalize this algorithm to sample …

What is representation learning?

Roger Grosse Machine Learning 1 Comment

In my last post, I argued that a major distinction in machine learning is between predictive learning and representation learning. Now I’ll take a stab at summarizing what representation learning is about. Or, at least, what I think of as the first principal component of representation learning.

High-Dimensional Probability Estimation with Deep Density Models

[latexpage] Ryan Adams and I just uploaded to the arXiv our paper “High-Dimensional Probability Estimation with Deep Density Models”. In this work, we introduce the deep density model (DDM), a new approach for density estimation.

Data compression and unsupervised learning

Data compression and unsupervised learning are two concepts whose relationship is perhaps underappreciated. Compression and unsupervised learning are both about finding patterns in data — but, does the similarity go any further? I argue that it does.

Learning Theory: Purely Theoretical?

[latexpage] What’s learning theory good for, anyway? As I mentioned in my earlier blog post, not infrequently get into conversations with people in machine learning and related fields who don’t see the benefit of learning theory (that is, theory of learning). While that post offered one specific piece of evidence of how work seemingly only relevant in pure theory could lead to practical algorithms, I thought I would talk in more general terms why I see learning theory as a worthwhile endeavor. There are two main flavors of learning theory, statistical learning theory (StatLT) and computational learning (CompLT). StatLT originated with Vladimir Vapnik, while the canonical example of CompLT, PAC learning, was formulated by Leslie Valiant. StatLT, in line with its “statistical” …

Getting above the fray with lifted inference

Hi, I’m Jon. In my series of posts, I’ll be writing about how we can use the modern Bayesian toolkit to efficiently make decisions, solve problems, and formulate plans (the providence of AI), rather than restrict ourselves to approximating posteriors (the providence of statistics and much of machine learning). Here’s a simple example of how AI can help out machine learning. What was the first graphical model you were exposed to? There’s a good chance it was Pearl’s famous “Sprinkler, Rain, Wet grass” graphical model.

What the hell is representation? *

Roger Grosse’s post on the need for a “solid theoretical framework” for “representation learning” is very intriguing. The term representation is ubiquitous in machine learning (for instance, it showed up in at least eight previous posts in this blog) and computational neuroscience (how are objects and concepts represented within the brain). My personal fascination with the topic started after watching David Krakauer’s talk on evolution of intelligence on earth, where he listed representation- in additions to inference, strategy, and Competition- as one of the tenets of intelligence; suggesting that our representations are tightly connected to the goals we aim to accomplish, how we infer hidden causes, what strategy we take on, and what competitive forces we have to deal with. …

Predictive learning vs. representation learning

[latexpage]I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy. To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are $N$ tasks and $M$ workers, and assume that the possible labels are $\{\pm 1\}$. Define the $N \times M$ matrix $L$ so that $L_{ij}$ is …