In the spirit of Ryan’s most recent post, I will discuss a fundamental snippet from numerical linear algebra that facilitates computation for the same price of not facilitating it. In our everyday lives, we often come across theoretical expressions that involve matrix inverses stapled to vectors, such as with . When we proceed to code this up, it is very tempting to first compute . Resist doing this! There are several points for why there is no point to actually find an explicit, tangible inverse.
This post is about a computational trick that everyone should know, but that doesn’t tend to be explicitly taught in machine learning courses. Imagine that we have a set of values, and we want to compute the quantity (1) This comes up all the time when you want to parameterize a multinomial distribution using a softmax, e.g., when doing logistic regression and you have more than two unordered categories. If you want to compute the log likelihood, you’ll find such an expression due to the normalization constant. Computing this naively can be a recipe for disaster, due to underflow or overflow, depending on the scale of the . Consider a simple example, with the vector . This seems pretty … Read More
- Page 2 of 2