## New Blog

I’m excited to announce a new collaborative blog, written by members of the Harvard Intelligent Probabilistic Systems group.  Broadly, our group studies machine learning, statistics, and computational neuroscience, but we’re interested in lots of things outside these areas as well.  The idea is to use this as a venue to discuss interesting ideas and results — new and old — about probabilistic modeling, inference, artificial intelligence, theoretical neuroscience, or anything else research-related that strikes our fancy.  There will be posts from folks at both Harvard and MIT, in computer science, mathematics, biophysics, and BCS departments, so expect a wide variety of interests.

## Healthy Competition?

Last week I attended the NIPS 2012 workshop on Connectomics: Opportunities and Challenges for Machine Learning, organized by Viren Jain and Moritz Helmstaedter. Connectomics is an emerging field that aims to map the neural wiring diagram of the brain. The current bottleneck to progress is analyzing the incredibly large (terabyte-petabyte range) data sets of 3d images obtained via electron microscopy. The analysis of the images entails tracing the neurons across images and eventually inferring synaptic connections based on physical proximity and other visual cues. One approach is manual tracing: at the workshop I learned that well over one million dollars has already been spent hiring manual tracers, resulting in data that is useful but many orders of magnitude short of even a very small brain.

The NIPS workshop was about using machine learning to speed up the process, and it consisted of talks, posters, and discussion. A previous workshop on this subject had a different flavor: it was a challenge workshop at ISBI 2012 (a similar idea to the Netflix challenge). To enter the challenge, anyone could download the training set and upload their results on the test data, which were then evaluated before the workshop (results here). At the NIPS workshop, the ISBI challenge was mentioned frequently, and scoring well on it seemed to be an important source credibility. Such a challenge can have a profound impact on the field, but is it a positive impact? Continue reading “Healthy Competition?”

## Learning Image Features from Video

While at NIPS, I came across the paper Deep Learning of Invariant Features via Simulated Fixations in Video by Will Zou, Shenghuo Zhu, Andrew Ng,  and Kai Yu. It proposes a particularly appealing unsupervised method for using videos to learn image features. Their method appears to be somewhat inspired by the human visual system. For instance, people have access to video data, not static images. They also attempt to mimic the human tendency to fixate on particular objects. They track objects through successive frames in order to provide more coherent data to the learning algorithm.

The authors use a stacked architecture, where each layer is trained by optimizing an embedding into a feature space. As usual, the optimization problem involves a reconstruction penalty and a sparsity penalty. In addition, however, it includes a temporal slowness penalty, which seeks to minimize the $$L_1$$ norm between the feature representations of consecutive frames. This enforces the intuition that good representations of images should change slowly as the images deform. Using this approach, the authors achieve improved performance on various classification tasks. Continue reading “Learning Image Features from Video”

## The Poisson Estimator

[latexpage]Much of what we do when we analyze data and invent algorithms is think about estimators for unknown quantities, even when we don’t directly phrase things this way.  One type of estimator that we commonly encounter is the Monte Carlo estimator, which approximates expectations via the sample mean.  That is, many problems in which we are interested involve a distribution $\pi$ on a space $\mathcal{X}$, where we wish to calculate the expectation of a function $f(x)$:

\begin{align*}

\hat{f}_{\pi} &= \int_{\mathcal{X}} \pi(x)\,f(x)\,\mathrm{d}x\\

&\approx \frac{1}{N}\sum_{n=1}^N f(x_n) \text{\qquad where } x_n \sim \pi.

\end{align*}

This is very nice because it gives you an unbiased estimator of $\hat{f}_\pi$.  That is, the expectation of this estimator is the desired quantity.  However, one issue that comes up very often is that we want to find an unbiased estimator of a quantity that is a function of an expectation.  Of course, we know that the expectation of a function is not in general a function of the expectation, so we can’t do the easy thing that we’d like to do and wind up with an unbiased estimator. Continue reading “The Poisson Estimator”

## The “Computation” in Computational Neuroscience

My aim in this introductory post is to provide context for future contributions by sharing my thoughts on the role of computer scientists in the study of brain, particularly in the field of computational neuroscience. For a field with “computation” in the name, it seems that computer scientists are underrepresented. I believe there are significant open questions in neuroscience which are best addressed by those who study the theory of computation, learning, and algorithms, and the systems upon which they are premised.

In my opinion “computational neuroscience” has two definitions: the first, from Marr, is the study of the computational capabilities of the brain, their algorithmic details, and their implementation in neural circuits; the second, stemming from machine learning, is the design and application of computational algorithms, either to solve problems in a biologically-inspired manner, or to aid in the processing and interpretation of neural data. Though quite different, I believe these are complementary and arguably co-dependent endeavors. The forward hypothesis generation advocated by the former seems unlikely to get the details right without the aid of computational and statistical tools for extracting patterns from neural recordings, guiding hypothesis generation, and comparing the evidence for competing models. Likewise, attempts to infer the fundamentals of neural computation from the bottom-up without strong inductive biases appear doomed to wander the vastness of the hypothesis space. How then, can computer scientists contribute to both aspects of computational neuroscience? Continue reading “The “Computation” in Computational Neuroscience”

## Turning Theory into Algorithms

[latexpage] Some of the common complaints I hear about (learning) theoretical work run along the lines of “those bounds are meaningless in practice,” “that result doesn’t apply to any algorithm someone would actually use,” and “you lost me as soon as martingales/Banach spaces/measure-theoretic niceties/… got involved.” I don’t have a good answer for the latter concern, but a very nice paper by Sasha Rakhlin, Ohad Shamir, and Karthik Sridharan at NIPS this year goes some ways toward address the first two criticisms. Their paper, “Relax and Randomize: From Value to Algorithms,” (extended version here) is concerned with transforming non-constructive online regret bounds into useful algorithms. Continue reading “Turning Theory into Algorithms”

## Should neurons be interpretable?

One basic aim of cognitive neuroscience is to answer questions like 1) what does a neuron or a group of neurons represent, and 2) how is cognitive computation implemented in neuronal hardware?  A common critique is that the field has simply failed to shed light on either of these questions. Our experimental techniques are perhaps too crude: fMRI’s temporal resolution is way too slow, EEG and MEG’s spatial resolution is far too coarse, electrode recordings miss the forest for the trees. But underlying these criticisms is the assumption that there is some implementation-level description of neural activity that is interpretable at the level of cognition: if only we recorded from a sufficient number of neurons and actually knew what the underlying connectivity looked like, then we could finally figure out what neurons are doing, and what they represent — whether it’s features, or population codes, or prediction error,  or whatever.

Is this a reasonable thing to hope for? Should neurons be interpretable at all? Clearly, no, Marr Level-1-ophiles will argue. After all, you wouldn’t hope to learn how a computer works by watching its bits flip, right?

## Method of moments

[latexpage]

The method of moments is a simple idea for estimating the parameters of a model. Suppose the observed data are sampled iid from a distribution $p(x|\theta)$, our goal is to estimate the parameter $\theta$. If we have enough data, we can get very close to the true mean of the distribution, $E[x]$, which is a  function of $\theta$: $E[x]=f_1(\theta)$. We know the form of $f_1$ since we know the form of $p(x|\theta)$.

For simple distributions, knowing just the mean is enough to invert $f_1$ to obtain $\theta$. In general, we need to calculate the higher moments, which are also known functions of $\theta$: $E[x^2] = f_2(\theta), …, E[x^n]=f_n(\theta)$. We then try to invert the systems of equations $f_1, …, f_n$ to obtain $\theta$. In practice, we typically only have enough data to accurately estimate the low order ($\leq 3$) moments. Continue reading “Method of moments”

## Discriminative (supervised) Learning

Often the goal of inference and learning is to use the inferred marginal distributions for prediction or classification purposes. In such scenarios, finding the correct “model structure” or the true “model parameters”, via maximum-likelihood (ML) estimation or (generalized) expectation-maximization (EM), is secondary to the final objective of minimizing a prediction or a classification cost function. Recently, I came across a few interesting papers on learning and inference in graphical models by direct optimization of a cost function of the inferred marginal distributions (or normalized beliefs) [1, 2, 3, 4]:

$$e = C( outcomes, f(bs); \Theta)$$,

where f is a differentiable function that maps the beliefs (bs) to the outcomes/labels of interest, $$\Theta$$ is a set of model parameters, and C is a differentiable cost function that penalizes for incorrect classifications or prediction. Continue reading “Discriminative (supervised) Learning”

## Nonparanormal Activity

[latexpage]

Say you have a set of $n$ $p$-dimensional iid samples $\{ \textbf x_i \}_{i=1}^n$ drawn from some unknown continuous distribution that you want to estimate with an undirected graphical model. You can sometimes get away with assuming the $\textbf x_i$’s are drawn from a multivariate normal (MVN), and from there you can use a host of methods for estimating the covariance matrix $\Sigma$, and thus the graph structure $\Omega = \Sigma^{-1}$ (perhaps imposing sparsity constraints for inferring structure in high dimensional data, $n<<p$).

In other cases the Gaussian assumption is too restrictive (e.g. when marginals exhibit multimodal behavior).

One way to augment the expressivity of the MVN while maintaining some of the desirable properties is to assume that some function of the data is MVN. Continue reading “Nonparanormal Activity”