Which research results will generalize?

One approach to AI research is to work directly on applications that matter — say, trying to improve production systems for speech recognition or medical imaging. But most research, even in applied fields like computer vision, is done on highly simplified proxies for the real world. Progress on object recognition benchmarks — from toy-ish ones like MNIST, NORB, and Caltech101, to complex and challenging ones like ImageNet and Pascal VOC — isn’t valuable in its own right, but only insofar as it yields insights that help us design better systems for real applications.

So it’s natural to ask: which research results will generalize to new situations?

Continue reading “Which research results will generalize?”

Prior knowledge and overfitting

When we talk about priors and regularization, we often motivate them in terms of “incorporating knowledge” or “preventing overfitting.” In a sense, the two are equivalent: any prior or regularizer must favor certain explanations relative to others, so favoring one explanation is equivalent to punishing others. But I’ll argue that these are two very different phenomena, and it’s useful to know which one is going on. Continue reading “Prior knowledge and overfitting”

ICML Highlight: Fast Dropout Training

In this post, I’ll summarize one of my favorite papers from ICML 2013: Fast Dropout Training, by Sida Wang and Christopher Manning. This paper derives an analytic approximation to dropout, a randomized regularization method recently proposed for training deep nets that has allowed big improvements in predictive accuracy.   Their approximation gives a roughly 10-times speedup under certain conditions.  Much more interestingly, the authors also show strong connections to existing regularization methods, shedding light on why dropout works so well. Continue reading “ICML Highlight: Fast Dropout Training”

Testing MCMC code, part 2: integration tests

This is the second of two posts based on a testing tutorial I’m writing with David Duvenaud.

In my last post, I talked about checking the MCMC updates using unit tests. Most of the mistakes I’ve caught in my own code were ones I caught with unit tests. (Naturally, I have no idea about the ones I haven’t caught.) But no matter how thoroughly we unit test, there are still subtle bugs that slip through the cracks. Integration testing is a more global approach, and tests the overall behavior of the software, which depends on the interaction of multiple components. Continue reading “Testing MCMC code, part 2: integration tests”

Compressing genomes

Here’s an interesting question: how much space would it take to store the genomes of everyone in the world? Well, there are about 3 billion base pairs in a genome, and at 2 bits per base (4 choices), we have 6 billion bits or about 750 MB (say we are only storing one copy of each chromosome). Multiply this by 7 billion people and we have about 4800 petabytes. Ouch! But we can do a lot better. Continue reading “Compressing genomes”

Testing MCMC code, part 1: unit tests

This post is taken from a tutorial I am writing with David Duvenaud.


When you write a nontrivial piece of software, how often do you get it completely correct on the first try?  When you implement a machine learning algorithm, how thorough are your tests?  If your answers are “rarely” and “not very,” stop and think about the implications.

There’s a large literature on testing the convergence of optimization algorithms and MCMC samplers, but I want to talk about a more basic problem here: how to test if your code correctly implements the mathematical specification of an algorithm. Continue reading “Testing MCMC code, part 1: unit tests”

The Central Limit Theorem

The proof and intuition presented here come from this excellent writeup by Yuval Filmus, which in turn draws upon ideas in this book by Fumio Hiai and Denes Petz. Suppose that we have a sequence of real-valued random variables

(1)   \begin{equation*} X_1, X_2, \ldots . \end{equation*}

Define the random variable

(2)   \begin{equation*} A_N = \frac{X_1 + \cdots + X_N}{\sqrt{N}} \end{equation*}

to be a scaled sum of the first N variables in the sequence. Now, we would like to make interesting statements about the sequence

(3)   \begin{equation*} A_1, A_2, \ldots . \end{equation*}

Continue reading “The Central Limit Theorem”

JIT compilation in MATLAB

A few years ago MATLAB introduced a Just-In-Time (JIT) accelerator under the hood. Because the JIT acceleration runs behind the scenes, it is easy to miss (in fact, MathWorks seems to intentionally hide it so that users do not change their coding style, probably because the JIT accelerator is changed regularly). I just wanted to briefly mention what a JIT accelerator is and what it does in MATLAB. Continue reading “JIT compilation in MATLAB”

Introspection in AI

I’ve recently come across a fascinating blog post by Cambridge mathematician Tim Gowers. He and computational linguist Mohan Ganesalingam built a sort of automated mathematician which does the kind of “routine” mathematical proofs that mathematicians can do without backtracking. Their system was based on a formal theory of the semantics of mathematical language, together with introspection into how they solved problems. In other words, they worked through lots of simple examples and checked that their AI could solve the problems in a way that was cognitively plausible. The goal wasn’t to build a useful system (standard theorem provers are way more powerful), but to provide insight into our problem solving process. This post reminded me that, while our field has long moved away from this style of research, I think there’s still a lot to be gained from it. Continue reading “Introspection in AI”

Machine Learning Glossary

I often have a hard time understanding the terminology in machine learning, even after almost three years in the field. For example, what is a Deep Belief Network? I attended a whole summer school on Deep Learning, but I’m still not quite sure. I decided to take a leap of faith and assume this is not just because the Deep Belief Networks in my brain are not functioning properly (although I am sure this is a factor). So, I created a Machine Learning Glossary to try to define some of these terms. The glossary can be found here. I have tried to write in an unpretentious style, defining things systematically and leaving no “exercises to the reader”. I also have a form for readers to request new definitions. Continue reading “Machine Learning Glossary”