I’ve recently come across a fascinating blog post by Cambridge mathematician Tim Gowers. He and computational linguist Mohan Ganesalingam built a sort of automated mathematician which does the kind of “routine” mathematical proofs that mathematicians can do without backtracking. Their system was based on a formal theory of the semantics of mathematical language, together with introspection into how they solved problems. In other words, they worked through lots of simple examples and checked that their AI could solve the problems in a way that was cognitively plausible. The goal wasn’t to build a useful system (standard theorem provers are way more powerful), but to provide insight into our problem solving process. This post reminded me that, while our field has long moved away from this style of research, I think there’s still a lot to be gained from it. Continue reading “Introspection in AI”

# Category: Meta

## Learning Theory: Purely Theoretical?

[latexpage]

What’s learning theory good for, anyway? As I mentioned in my earlier blog post, not infrequently get into conversations with people in machine learning and related fields who don’t see the benefit of learning theory (that is, theory of learning). While that post offered one specific piece of evidence of how work seemingly only relevant in pure theory could lead to practical algorithms, I thought I would talk in more general terms why I see learning theory as a worthwhile endeavor.

There are two main flavors of learning theory, statistical learning theory (StatLT) and computational learning (CompLT). StatLT originated with Vladimir Vapnik, while the canonical example of CompLT, PAC learning, was formulated by Leslie Valiant. StatLT, in line with its “statistical” descriptor, focuses on asymptotic questions (though generally based on useful non-asymptotic bounds). It is less concerned with computational efficiency, which is where CompLT comes in. Computer scientists are all about efficient algorithms (which for the purposes of theory essentially means polynomial vs. super-polynomial time). Generally, StatLT results apply to a wider variety of hypothesis classes, with few or no assumptions made about the concept class (a concept class refers to the class of functions to which the data generating mechanism belongs). CompLT results apply to very specific concept classes but have stronger performance guarantees, often using polynomial time algorithms. I’ll do my best to defend both flavors, while also mentioning some of their limitations.

## Is AI scary?

In today’s New York Times, Huw Price, professor of philosophy at Cambridge, writes about the need for considering the potential dangers associated with a possible “singularity.” The singularity is the idea, I guess, that if people create machines that are smarter than people then those machines would be smart enough to create machines smarter than themselves, etc., and that there would be an exponential explosion in artificial intelligence. Price suggests that whether or not the singularity is likely enough to warrant study in its own right, it is the possible danger associated with it that makes it important.

I’m not remotely worried about this. As someone who has been toiling away for many months at creating an artificial intelligence algorithm that has something evolutionary about it, I feel that my pessimism (or, optimism, as Price calls it) is informed. But rather than try to explain why I’m pessimistic, I thought I would present and react to only one point that Price makes. He writes:

biology got us onto this exalted peak in the landscape, the tricks are all there for our inspection: most of it is done with the glop inside our skulls. Understand that, and you understand how to do it artificially, at least in principle. Sure, it could turn out that there’s then no way to improve things – that biology, despite all the constraints, really has hit some sort of fundamental maximum. Or it could turn out that the task of figuring out how biology did it is just beyond us, at least for the foreseeable future (even the remotely foreseeable future). But again, are you going to bet your grandchildren on that possibility?

## It Depends on the Model

[latexpage]

In my last blog post I wrote about the asymptotic equipartition principle. This week I will write about something completely unrelated.

This blog post evolved from a discussion with Brendan O’Connor about science and evidence. The back story is as follows. Continue reading “It Depends on the Model”

## Markov chain centenary

I just attended a fun event, Celebrating 100 Years of Markov Chains, at the Institute for Applied Computational Science. There were three talks and they were taped, so hopefully you will be able to find the videos through the IACS website in the near future. Below, I will review some highlights of the first two talks by Brian Hayes and Ryan Adams; I’m skipping the last one because it was more of a review of concepts building up to and surrounding Markov chain Monte Carlo (MCMC).

The first talk was intriguingly called “First Links in the Markov Chain: Poetry and Probability” Continue reading “Markov chain centenary”

## Machine Learning that Doesn’t Matter

At ICML last year, Kiri Wagstaff (KW) delivered a plenary talk and accompanying paper entitled “Machine Learning that Matters.” KW, a researcher at the NASA Jet Propulsion Laboratory (JPL), draws attention to a number of very serious issues in the field but draws conclusions that differ from my own.

KW criticizes existing benchmark data sets such as the UCI data sets or the MNIST handwritten digit data set for being irrelevant or obsolete. I certainly agree that being state-of-the-art on MNIST is not necessarily important (see my last post for more discussion on the need carefully craft competitions based on benchmark data sets).

## Should neurons be interpretable?

One basic aim of cognitive neuroscience is to answer questions like 1) what does a neuron or a group of neurons represent, and 2) how is cognitive computation implemented in neuronal hardware? A common critique is that the field has simply failed to shed light on either of these questions. Our experimental techniques are perhaps too crude: fMRI’s temporal resolution is way too slow, EEG and MEG’s spatial resolution is far too coarse, electrode recordings miss the forest for the trees. But underlying these criticisms is the assumption that there is some implementation-level description of neural activity that is interpretable at the level of cognition: if only we recorded from a sufficient number of neurons and actually knew what the underlying connectivity looked like, then we could finally figure out what neurons are doing, and what they represent — whether it’s features, or population codes, or prediction error, or whatever.

Is this a reasonable thing to hope for? Should neurons be interpretable at all? Clearly, no, Marr Level-1-ophiles will argue. After all, you wouldn’t hope to learn how a computer works by watching its bits flip, right?

## New Blog

I’m excited to announce a new collaborative blog, written by members of the Harvard Intelligent Probabilistic Systems group. Broadly, our group studies machine learning, statistics, and computational neuroscience, but we’re interested in lots of things outside these areas as well. The idea is to use this as a venue to discuss interesting ideas and results — new and old — about probabilistic modeling, inference, artificial intelligence, theoretical neuroscience, or anything else research-related that strikes our fancy. There will be posts from folks at both Harvard and MIT, in computer science, mathematics, biophysics, and BCS departments, so expect a wide variety of interests.

— Ryan Adams