# Learning Image Features from Video

The authors use a stacked architecture, where each layer is trained by optimizing an embedding into a feature space. As usual, the optimization problem involves a reconstruction penalty and a sparsity penalty. In addition, however, it includes a temporal slowness penalty, which seeks to minimize the $$L_1$$ norm between the feature representations of consecutive frames. This enforces the intuition that good representations of images should change slowly as the images deform. Using this approach, the authors achieve improved performance on various classification tasks.