Hashing, streaming and sketching

Elaine Angelino Machine Learning Leave a Comment

One of the questions in the air at NIPS 2012 was, how do we make machine learning algorithms scale to large datasets? There are two main approaches: (1) developing parallelizable ML algorithms and integrating them with large parallel systems and (2) developing more efficient algorithms. More often than not, the latter approach requires some sort of relaxation of an underlying task. Hashing, streaming algorithms and sketching are increasingly employed to achieve efficient approximate algorithms that arise in ML tasks. Below, I highlight a few examples, mostly from NIPS 2012, with several coming from the Big Learning workshop. Nearest neighbor search (or similarity search) appears in many “meta” ML tasks such as information retrieval and near-duplicate detection. Many approximate approches are …