A useful way to think about derivatives (and gradients/Jacobians more generally) is as the maps that give you the best affine approximation at a point. This is part of a series of videos for COS 302: Mathematics for Numerical Computation and Machine Learning, replacing lectures after the course went remote due to the COVID-19 pandemic.
We think about gradients a lot in machine learning. This video talks about partial derivatives in general and Jacobian matrices, which specialize to gradients for scalar functions. This is part of a series of videos for COS 302: Mathematics for Numerical Computation and Machine Learning, replacing lectures after the course went remote due to the COVID-19 pandemic.
Differentiation is fundamental to lots of different aspects of machine learning. This video starts at the beginning and talks about the limit of the difference quotient, the product rule, and the chain rule. This is part of a series of videos for COS 302: Mathematics for Numerical Computation and Machine Learning, replacing lectures after the course went remote due to the COVID-19 pandemic.
The log marginal likelihood is a central object for Bayesian inference with latent variable models: where are observations, are latent variables, and are parameters. Variational inference tackles this problem by approximating the posterior over with a simpler density . Often this density has a factored structure, for example. The approximating density is fit by maximizing a lower bound on the log marginal likelihood, or “evidence” (hence ELBO = evidence lower bound): The hope is that this will be a tight enough bound that we can use this as a proxy for the marginal likelihood when reasoning about . The ELBO is typically derived in one of two ways: via Jensen’s inequality or by writing down the … Read More
A few years ago, I co-founded the Talking Machines podcast with Katy Gorman and we co-hosted it together for two seasons. After those two seasons, I decided it was time to move on and let someone else host it, but it was a highly rewarding experience. Since then, I’ve contemplated starting a YouTube channel, motivated in part by that outreach experience and also by the amazing people on the platform who I’ve enjoyed watching and learning from. I’ve often felt there was a hole on machine learning topics on YouTube. There are zillions of deep learning “explainer” videos and of course many online lectures, but there’s not a lot that 1) explains the broader technical context of machine learning, and … Read More
The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs. However, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn. Additionally, structured discrete domains are often characterized by strict constraints on what constitutes a valid object and generative models must respect these requirements in order to produce useful novel samples. Here, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity.
We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates.