Learning Image Features from Video

While at NIPS, I came across the paper Deep Learning of Invariant Features via Simulated Fixations in Video by Will Zou, Shenghuo Zhu, Andrew Ng,  and Kai Yu. It proposes a particularly appealing unsupervised method for using videos to learn image features. Their method appears to be somewhat inspired by the human visual system. For instance, people have access to video data, not static images. They also attempt to mimic the human tendency to fixate on particular objects. They track objects through successive frames in order to provide more coherent data to the learning algorithm.

The authors use a stacked architecture, where each layer is trained by optimizing an embedding into a feature space. As usual, the optimization problem involves a reconstruction penalty and a sparsity penalty. In addition, however, it includes a temporal slowness penalty, which seeks to minimize the $L_1$ norm between the feature representations of consecutive frames. This enforces the intuition that good representations of images should change slowly as the images deform. Using this approach, the authors achieve improved performance on various classification tasks.

I suspect that even more information is contained in video data than the authors make use of. It makes sense that feature representations ought to change slowly, but they should also change consistently. For instance, an object rotating in one direction will likely continue to rotate in the same direction for several frames. If the lighting dims from one frame to another, it will probably continue to dim for several more frames. In other words, features should change slowly and smoothly. Such "slow and smooth" priors have been successfully used for motion estimation, and they capture something natural about the way images deform.

The authors encoded their intuition about slowly-changing features into the optimization problem by adding a term that penalizes the first differences of the representations. To build in the intuition about smoothly-changing features, we could add a term that penalizes the second differences of the representations. Of course, this opens the possibility of considering third and fourth differences as well, and it would be interesting to see if such higher derivatives give any added benefit.

Read More

Healthy Competition?

Last week I attended the NIPS 2012 workshop on Connectomics: Opportunities and Challenges for Machine Learning, organized by Viren Jain and Moritz Helmstaedter. Connectomics is an emerging field that aims to map the neural wiring diagram of the brain. The current bottleneck to progress is analyzing the incredibly large (terabyte-petabyte range) data sets of 3d images obtained via electron microscopy. The analysis of the images entails tracing the neurons across images and eventually inferring synaptic connections based on physical proximity and other visual cues. One approach is manual tracing: at the workshop I learned that well over one million dollars has already been spent hiring manual tracers, resulting in data that is useful but many orders of magnitude short of even a very small brain.

The NIPS workshop was about using machine learning to speed up the process, and it consisted of talks, posters, and discussion. A previous workshop on this subject had a different flavor: it was a challenge workshop at ISBI 2012 (a similar idea to the Netflix challenge). To enter the challenge, anyone could download the training set and upload their results on the test data, which were then evaluated before the workshop (results here). At the NIPS workshop, the ISBI challenge was mentioned frequently, and scoring well on it seemed to be an important source credibility. Such a challenge can have a profound impact on the field, but is it a positive impact?

The advantage of a challenge is that competition motivates us to work harder and get better results. It also enables comparison of different algorithms via a standardized benchmark data set and scoring mechanism. Challenges can also raise the profile of the problem, drawing in outside contributions. But at what cost? The main objection to the challenge is that it may focus everyone’s attention to the wrong goal. For example, the challenge data set is 2 x 2 x 1.5 microns, or 6 cubic microns (about 7.5 megavoxels). But, if the goal is to segment a mouse brain of about a cubic millimeter, then the challenge data set is about 8 orders of magnitude too small. On the one hand, the challenge should not be made so difficult as to be impossible, but on the other hand, what if the winning algorithm is viable on small data sets and hopeless at scale?

Another issue is that the scoring mechanism may not really be measuring progress. I was pleased to see that the challenge evaluated segmentations based on the warping error and Rand error in addition to the pixel error. These newer metrics, brought into connectomics through work by Viren Jain and Srini Turaga in Sebastian Seung’s lab, measure segmentation quality based on topological considerations instead of just the number of correctly classified pixels, thereby mainly penalizing the types of errors that are important to the end-goal of connectomics. However, just as I was starting to feel good about things, David Cox suggested that since the end goal is a wiring diagram, perhaps segmentation is not even a necessary intermediate step at all for transforming images to connectivities. While it is not immediately clear to me what alternatives are available, David’s comment caused me to think of a bigger picture that the challenge and its scoring details were distracting me from. It was a good point in itself but also a good point about how challenges may hinder out-of-the-box thinking and be detrimental to progress.

Indeed, in my personal opinion the future of connectomics lies in cleverly interleaving human and machine intelligence to tackle this enormous task. I am not alone in this view: EyeWire, which had its official launch last week, is a platform for crowdsourcing segmentation for connectomics. How would such solutions fit into the challenge? The challenge rules state that the challenge training set “is the only data that participants are allowed to use to train their algorithms.” Presumably this disallows crowdsourcing at test time. But, if the ultimate solution will require a mix of humans and machines, why insist on a fully automatic algorithm for the challenge? And if you did allow crowdsourcing methods, how would you evaluate competing systems, taking into account time, money, and other resources spent?

These are difficult questions, and I do not have all the answers. But I think understanding the impact of competitive challenges is an important question and I invite more discussion and careful consideration of the issue.

Read More

New Blog

I'm excited to announce a new collaborative blog, written by members of the Harvard Intelligent Probabilistic Systems group.  Broadly, our group studies machine learning, statistics, and computational neuroscience, but we're interested in lots of things outside these areas as well.  The idea is to use this as a venue to discuss interesting ideas and results -- new and old -- about probabilistic modeling, inference, artificial intelligence, theoretical neuroscience, or anything else research-related that strikes our fancy.  There will be posts from folks at both Harvard and MIT, in computer science, mathematics, biophysics, and BCS departments, so expect a wide variety of interests.


-- Ryan Adams

Read More