In my previous post, I wrote about why I find learning theory to be a worthwhile endeavor. In this post I want to discuss a few open/under-explored areas in learning theory that I think are particularly interesting. If you know of progress in these areas, please email me or post in the comments.
1. Humans seem to operate well with very little data. What assumptions are we as humans making that differ from those in learning theory? These assumptions are probably something stronger than distribution-independent (e.g. in traditional PAC learning and statistical learning theory) but much weaker than assuming uniform distribution (which is, for example, studied in the PAC framework). Defining such in-between, “natural” distributions is very challenging. Luckily, those in the computational complexity theory community deal with these issues, so some very smart people have thought about this. Complexity theorists have defined various notions of “average case complexity” and the concept of “smoothed analysis,” which attempts to eliminate “pathological cases” (essentially those only occurring when operating with infinite precision). A big triumph of smoothed analysis was showing that the simplex algorithm for solving linear programs is runs in polynomial time under reasonable smoothing conditions. It would be interesting to develop similar results for learning, perhaps showing that its possible to learn (up to necessary error), certain classes of noisified concepts
2. There is a lot of theory for supervised learning and quite a bit for online learning and density estimation. However, there isn’t much theory for unsupervised/representation learning* (though see, e.g., Nina Balcan’s thesis). Indeed, as Roger has discussed, it’s hard to even formulate the problem we are solving in the unsupervised setting.
3. There are various results in Bayesian theory regarding consistency and convergence of Bayesian inference. There is also an area somewhat within statistical learning theory called PAC-Bayesian theory. PAC-Bayes theory provides some very tight generalization bounds, applicable in supervised and unsupervised learning, density estimation, and reinforcement learning. It’s a beautiful area of work which integrates the Bayesian concept of having a prior distribution over the hypothesis class, which neither computational learning theory nor standard statistical learning do. But while it has a Bayesian flavor, it’s not strictly concerned with the classical conception of Bayesian inference. It would be interesting to have a “Bayesian learning theory”, though it’s possible such a thing is doable using existing PAC-Bayesian results.**
* Now obligatory disclaimer regarding the subtly of this distinction, with a link to Roger’s post.
** I am not a PAC-Bayes guru by any means, so any corrections to this portion from an expert are appreciated.