Compressing genomes

Michael GelbartBlog, Compression

Here’s an interesting question: how much space would it take to store the genomes of everyone in the world? Well, there are about 3 billion base pairs in a genome, and at 2 bits per base (4 choices), we have 6 billion bits or about 750 MB (say we are only storing one copy of each chromosome). Multiply this by 7 billion people and we have about 4800 petabytes. Ouch! But we can do a lot better.

JIT compilation in MATLAB

Michael GelbartBlog, Computation

A few years ago MATLAB introduced a Just-In-Time (JIT) accelerator under the hood. Because the JIT acceleration runs behind the scenes, it is easy to miss (in fact, MathWorks seems to intentionally hide it so that users do not change their coding style, probably because the JIT accelerator is changed regularly). I just wanted to briefly mention what a JIT accelerator is and what it does in MATLAB.

Machine Learning Glossary

Michael GelbartBlog, Machine Learning

I often have a hard time understanding the terminology in machine learning, even after almost three years in the field. For example, what is a Deep Belief Network? I attended a whole summer school on Deep Learning, but I’m still not quite sure. I decided to take a leap of faith and assume this is not just because the Deep Belief Networks in my brain are not functioning properly (although I am sure this is a factor). So, I created a Machine Learning Glossary to try to define some of these terms. The glossary can be found here. I have tried to write in an unpretentious style, defining things systematically and leaving no “exercises to the reader”. I also have … Read More

Healthy Competition?

Michael GelbartNeuroscienceLeave a Comment

Last week I attended the NIPS 2012 workshop on Connectomics: Opportunities and Challenges for Machine Learning, organized by Viren Jain and Moritz Helmstaedter. Connectomics is an emerging field that aims to map the neural wiring diagram of the brain. The current bottleneck to progress is analyzing the incredibly large (terabyte-petabyte range) data sets of 3d images obtained via electron microscopy. The analysis of the images entails tracing the neurons across images and eventually inferring synaptic connections based on physical proximity and other visual cues. One approach is manual tracing: at the workshop I learned that well over one million dollars has already been spent hiring manual tracers, resulting in data that is useful but many orders of magnitude short of … Read More