When you take a machine learning class, there’s a good chance it’s divided into a unit on supervised learning and a unit on unsupervised learning. We certainly care about this distinction for a practical reason: often there’s orders of magnitude more data available if we don’t need to collect ground-truth labels. But we also tend to think it matters for more fundamental reasons. In particular, the following are some common intuitions:
One big recent piece of news from across the Atlantic is that the European Commission is funding a brain simulation project to the tune of a billion Euros. Roughly, the objective is to simulate the entire human brain using a supercomputer. Needless to say, many people are skeptical and there are lots of reasons that one might think this project is unlikely to yield useful results. One criticism centers on whether even a supercomputer can simulate the complexity of the brain. A first step towards simulating a brain is thinking about how many FLOP/s (floating point operations per second) would be necessary to implement similar functionality in conventional computer hardware. Here I will discuss two back-of-the-envelope estimates of this computational … Read More
In today’s New York Times, Huw Price, professor of philosophy at Cambridge, writes about the need for considering the potential dangers associated with a possible “singularity.” The singularity is the idea, I guess, that if people create machines that are smarter than people then those machines would be smart enough to create machines smarter than themselves, etc., and that there would be an exponential explosion in artificial intelligence. Price suggests that whether or not the singularity is likely enough to warrant study in its own right, it is the possible danger associated with it that makes it important. I’m not remotely worried about this. As someone who has been toiling away for many months at creating an artificial intelligence algorithm … Read More
I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy. To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are tasks and workers, and assume that the possible labels are . Define the matrix so that is the label given to task by worker (or … Read More
A common activity in statistics and machine learning is optimization. For instance, finding maximum likelihood and maximum a posteriori estimates require maximizing the likilihood function and posterior distribution respectively. Another example, and the motivating example for this post, is using variational inference to approximate a posterior distribution. Suppose we are interested in a posterior distribution, , that we cannot compute analytically. We will approximate with the variational distribution that is parameterized by the variational parameters . Variational inference then proceeds to minimize the KL divergence from to , . The dominant assumption in machine learning for the form of is a product distribution, that is (where we assume there are variational parameters). It can be shown that minimizing is equivalent … Read More
Developing efficient (i.e. polynomial time) algorithms with guaranteed performance is a central goal in computer science (perhaps the central goal). In machine learning, inference algorithms meeting these requirements are much rarer than we would like: often, an algorithm is either efficient but doesn’t perform optimally or vice versa. A number of results from the 1990’s demonstrate the challenges of, but also the potential for, efficient Bayesian inference. These results were carried out in the context of Bayesian networks.
In my last blog post I wrote about the asymptotic equipartition principle. This week I will write about something completely unrelated. This blog post evolved from a discussion with Brendan O’Connor about science and evidence. The back story is as follows.
I just attended a fun event, Celebrating 100 Years of Markov Chains, at the Institute for Applied Computational Science. There were three talks and they were taped, so hopefully you will be able to find the videos through the IACS website in the near future. Below, I will review some highlights of the first two talks by Brian Hayes and Ryan Adams; I’m skipping the last one because it was more of a review of concepts building up to and surrounding Markov chain Monte Carlo (MCMC). The first talk was intriguingly called “First Links in the Markov Chain: Poetry and Probability”
In the spirit of Ryan’s most recent post, I will discuss a fundamental snippet from numerical linear algebra that facilitates computation for the same price of not facilitating it. In our everyday lives, we often come across theoretical expressions that involve matrix inverses stapled to vectors, such as with . When we proceed to code this up, it is very tempting to first compute . Resist doing this! There are several points for why there is no point to actually find an explicit, tangible inverse.
There are several topics I would like to talk about in the future posts, and they generally fall under the category of theoretical (or systems) neuroscience, and sometimes more broadly biological physics. The topics to be discussed include: the problem of invariance in theoretical neuroscience, Schrodinger’s take on physics of living matter and other modern thoughts on fundamental principles underlying biology (optimality principle, role of noise, etc), dynamics and computation: are they mutually exclusive concepts?, correlations (correlations in statistical physics, and the role of correlation in an ensemble of neurons), reinforcement learning, and more. I will start with the post on invariance.