[latexpage] Before diving into more technical posts, I want to briefly touch on some basic questions, and the big picture behind unsupervised learning. I also want to do a bit of handwaving on sparsity—a topic that has gotten a lot of attention recently. Let’s say we are given observations $\mathbf{y}_1,\ldots,\mathbf{y}_N\in\mathbb{R}^D$. These points are assumed to contain some underlying structure, which we seek to capture in order to perform tasks such as classification or compression. We can apply our algorithms on the data in their raw form—which carries unidentified redundancy—and hope for the best. However, a more sensible approach would be to first