Last week I attended the NIPS 2012 workshop on Connectomics: Opportunities and Challenges for Machine Learning, organized by Viren Jain and Moritz Helmstaedter. Connectomics is an emerging field that aims to map the neural wiring diagram of the brain. The current bottleneck to progress is analyzing the incredibly large (terabyte-petabyte range) data sets of 3d images obtained via electron microscopy. The analysis of the images entails tracing the neurons across images and eventually inferring synaptic connections based on physical proximity and other visual cues. One approach is manual tracing: at the workshop I learned that well over one million dollars has already been spent hiring manual tracers, resulting in data that is useful but many orders of magnitude short of even a very small brain.
The NIPS workshop was about using machine learning to speed up the process, and it consisted of talks, posters, and discussion. A previous workshop on this subject had a different flavor: it was a challenge workshop at ISBI 2012 (a similar idea to the Netflix challenge). To enter the challenge, anyone could download the training set and upload their results on the test data, which were then evaluated before the workshop (results here). At the NIPS workshop, the ISBI challenge was mentioned frequently, and scoring well on it seemed to be an important source credibility. Such a challenge can have a profound impact on the field, but is it a positive impact?
The advantage of a challenge is that competition motivates us to work harder and get better results. It also enables comparison of different algorithms via a standardized benchmark data set and scoring mechanism. Challenges can also raise the profile of the problem, drawing in outside contributions. But at what cost? The main objection to the challenge is that it may focus everyone’s attention to the wrong goal. For example, the challenge data set is 2 x 2 x 1.5 microns, or 6 cubic microns (about 7.5 megavoxels). But, if the goal is to segment a mouse brain of about a cubic millimeter, then the challenge data set is about 8 orders of magnitude too small. On the one hand, the challenge should not be made so difficult as to be impossible, but on the other hand, what if the winning algorithm is viable on small data sets and hopeless at scale?
Another issue is that the scoring mechanism may not really be measuring progress. I was pleased to see that the challenge evaluated segmentations based on the warping error and Rand error in addition to the pixel error. These newer metrics, brought into connectomics through work by Viren Jain and Srini Turaga in Sebastian Seung’s lab, measure segmentation quality based on topological considerations instead of just the number of correctly classified pixels, thereby mainly penalizing the types of errors that are important to the end-goal of connectomics. However, just as I was starting to feel good about things, David Cox suggested that since the end goal is a wiring diagram, perhaps segmentation is not even a necessary intermediate step at all for transforming images to connectivities. While it is not immediately clear to me what alternatives are available, David’s comment caused me to think of a bigger picture that the challenge and its scoring details were distracting me from. It was a good point in itself but also a good point about how challenges may hinder out-of-the-box thinking and be detrimental to progress.
Indeed, in my personal opinion the future of connectomics lies in cleverly interleaving human and machine intelligence to tackle this enormous task. I am not alone in this view: EyeWire, which had its official launch last week, is a platform for crowdsourcing segmentation for connectomics. How would such solutions fit into the challenge? The challenge rules state that the challenge training set “is the only data that participants are allowed to use to train their algorithms.” Presumably this disallows crowdsourcing at test time. But, if the ultimate solution will require a mix of humans and machines, why insist on a fully automatic algorithm for the challenge? And if you did allow crowdsourcing methods, how would you evaluate competing systems, taking into account time, money, and other resources spent?
These are difficult questions, and I do not have all the answers. But I think understanding the impact of competitive challenges is an important question and I invite more discussion and careful consideration of the issue.