What the hell is representation? *

Roger Grosse’s post on the need for a “solid theoretical framework” for “representation learning” is very intriguing. The term representation is ubiquitous in machine learning (for instance, it showed up in at least eight previous posts in this blog) and computational neuroscience (how are objects and concepts represented within the brain).

My personal fascination with the topic started after watching David Krakauer’s talk on evolution of intelligence on earth, where he listed representation- in additions to inference, strategy, and Competition- as one of the tenets of intelligence; suggesting that our representations are tightly connected to the goals we aim to accomplish, how we infer hidden causes, what strategy we take on, and what competitive forces we have to deal with.

Professor Krakauer goes on to reason that what enabled invention of Algebra was “efficient” representation of numbers via the Arabic numeral system (think about 3998 in Arabic numerals versus Roman numerals: MMMCMXCVIII), which allowed for easier manipulation of numbers (for exmaple, using Arabic numerals: 42 x 133 = 5586, versus using Roman numerals: XLII x CXXXIII = MMMMMDLXXXVI). The Arabic numeral system is not only more compressed, but also yields itself to easier compositional rules (not clear how the second multiplication works!).

So, why representation is important? The short answer is that “good” representation makes life so much easier, presumably, by reducing the computational burden of doing inference/classification/prediction. How can we systematically arrive at a good representation? In general, good representations seem to keep only those features of the data that co-vary the most with respect to outcomes of interest. If so, then there are no “good” representations, only good representations with respect to a set of objectives, given all the computational and resource constraints.

In conclusion, I suspect combining the unsupervised-learning and supervised-learning into one coherent learning framework can be a good starting point (i.e., extracting latent features from the data that are maximally predictive of outcomes of interest; see Outcome Discriminative Learning and deep learning).


There are these two young fish swimming along, and they happen to meet an older fish swimming the other way, who nods at them and says, “Morning, boys, how’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes, “What the hell is water?” — David Foster Wallace

On representation and sparsity


Before diving into more technical posts, I want to briefly touch on some basic questions, and the big picture behind unsupervised learning. I also want to do a bit of handwaving on sparsity—a topic that has gotten a lot of attention recently.

Let’s say we are given observations $\mathbf{y}_1,\ldots,\mathbf{y}_N\in\mathbb{R}^D$. These points are assumed to contain some underlying structure, which we seek to capture in order to perform tasks such as classification or compression. We can apply our algorithms on the data in their raw form—which carries unidentified redundancy—and hope for the best. However, a more sensible approach would be to first Continue reading “On representation and sparsity”