Learning Image Features from Video

Robert Nishihara Machine Learning Leave a Comment

While at NIPS, I came across the paper Deep Learning of Invariant Features via Simulated Fixations in Video by Will Zou, Shenghuo Zhu, Andrew Ng,  and Kai Yu. It proposes a particularly appealing unsupervised method for using videos to learn image features. Their method appears to be somewhat inspired by the human visual system. For instance, people have access to video data, not static images. They also attempt to mimic the human tendency to fixate on particular objects. They track objects through successive frames in order to provide more coherent data to the learning algorithm. The authors use a stacked architecture, where each layer is trained by optimizing an embedding into a feature space. As usual, the optimization problem involves a reconstruction …