Often the goal of inference and learning is to use the inferred marginal distributions for prediction or classification purposes. In such scenarios, finding the correct “model structure” or the true “model parameters”, via maximum-likelihood (ML) estimation or (generalized) expectation-maximization (EM), is secondary to the final objective of minimizing a prediction or a classification cost function. Recently, I came across a few interesting papers on learning and inference in graphical models by direct optimization of a cost function of the inferred marginal distributions (or normalized beliefs) [1, 2, 3, 4]: , where f is a differentiable function that maps the beliefs (bs) to the outcomes/labels of interest, is a set of model parameters, and C is a differentiable cost function that penalizes for incorrect …