Accelerating and Improving Approximate Bayesian Inference
Probabilistic models are a powerful way to reason about uncertainty in order to support prediction, decision making, and discovery. Probabilistic models are specified by coming up with a joint probability distribution that connects unknown parameters, unobserved latent variables, and observed data. We often describe this approach as "generative modeling" because it constructs a distribution over data from which we can sample. Not only does this let us "tell a story" about what latent factors might've contributed to the structure in the data, but it also lets us apply mathematical tools such as graph theory in the form of probabilistic graphical models.
The challenge of the approach, however, is that learning (fitting parameters) and inference (fitting latent variables) both correspond to manipulating a conditional distribution --- the Bayesian posterior distribution --- that may have annoying structure. To make predictions and decisions, we need to be able to take expectations under this distribution. The goal of "approximate inference" in this context is to develop computational algorithms that allow one to compute such expectations.
There are two main approaches to approximate inference: 1) drawing samples from the posterior using a technique such as Markov chain Monte Carlo, and 2) approximating the intractable posterior with a simpler family, i.e., variational inference. In the LIPS group, we study both approaches and also look for ways to apply these methods to real problems in, e.g., astronomy.
Zoltowski, David M.; Cai, Diana; Adams, Ryan P.
Slice sampling reparameterization gradients Conference
Advances in Neural Information Processing Systems 34 (NeurIPS), 2021.
@conference{zoltowski2021slice,
title = {Slice sampling reparameterization gradients},
author = {David M. Zoltowski and Diana Cai and Ryan P. Adams},
year = {2021},
date = {2021-12-01},
booktitle = {Advances in Neural Information Processing Systems 34 (NeurIPS)},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Luo, Yucen; Beatson, Alex; Norouzi, Mohammad; Zhu, Jun; Duvenaud, David; Adams, Ryan P.; Chen, Ricky T. Q.
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models Conference
Proceedings of the Eighth International Conference on Learning Representations (ICLR), 2020.
@conference{luo2020sumo,
title = {SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models},
author = {Yucen Luo and
Alex Beatson and
Mohammad Norouzi and
Jun Zhu and
David Duvenaud and
Ryan P. Adams and
Ricky T. Q. Chen},
url = {https://openreview.net/forum?id=SylkYeHtwr},
year = {2020},
date = {2020-04-30},
booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR)},
abstract = {The standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Beatson, Alex; Adams, Ryan P.
Efficient Optimization of Loops and Limits with Randomized Telescoping Sums Conference
Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.
@conference{beatson2019efficient,
title = {Efficient Optimization of Loops and Limits with Randomized Telescoping Sums},
author = {Alex Beatson and
Ryan P. Adams},
url = {https://www.cs.princeton.edu/~rpa/pubs/beatson2019efficient.pdf},
year = {2019},
date = {2019-06-13},
booktitle = {Proceedings of the 36th International Conference on Machine Learning (ICML)},
abstract = {We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Regier, Jeffrey; Miller, Andrew C.; Schlegel, David; Adams, Ryan P.; McAuliffe, Jon D.; Prabhat,
Approximate inference for constructing astronomical catalogs from images Journal Article
In: Annals of Applied Statistics, vol. 13, no. 3, pp. 1884-1926, 2019.
@article{regier2019approximate,
title = {Approximate inference for constructing astronomical catalogs from images},
author = {Jeffrey Regier and
Andrew C. Miller and
David Schlegel and
Ryan P. Adams and
Jon D. McAuliffe and
Prabhat},
url = {https://www.cs.princeton.edu/~rpa/pubs/regier2019approximate.pdf},
year = {2019},
date = {2019-03-01},
journal = {Annals of Applied Statistics},
volume = {13},
number = {3},
pages = {1884-1926},
abstract = {We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Carlo (MCMC) while the other is based on variational inference (VI). The MCMC procedure excels at quantifying uncertainty, while the VI procedure is 1000 times faster. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50 terabytes of images in 14.6 minutes, demonstrating the scaling characteristics necessary to construct catalogs for upcoming astronomical surveys.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Miller, Andrew C.; Foti, Nicholas J.; Adams, Ryan P.
Variational Boosting: Iteratively Refining Posterior Approximations Conference
Proceedings of the 34th International Conference on Machine Learning (ICML), 2017, (arXiv:1611.06585 [stat.ML]).
@conference{miller2017boosting,
title = {Variational Boosting: Iteratively Refining Posterior Approximations},
author = {Andrew C. Miller and Nicholas J. Foti and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/miller2017boosting.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings of the 34th International Conference on Machine Learning (ICML)},
abstract = {We propose a black-box variational inference method to
approximate intractable distributions with an increasingly
rich approximating class. Our method, termed variational
boosting, iteratively refines an existing variational
approximation by solving a sequence of optimization problems,
allowing the practitioner to trade computation time for
accuracy. We show how to expand the variational approximating
class by incorporating additional covariance structure and by
introducing new components to form a mixture. We apply
variational boosting to synthetic and real statistical models,
and show that resulting posterior inferences compare favorably
to existing posterior approximation algorithms in both
accuracy and efficiency.},
note = {arXiv:1611.06585 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
approximate intractable distributions with an increasingly
rich approximating class. Our method, termed variational
boosting, iteratively refines an existing variational
approximation by solving a sequence of optimization problems,
allowing the practitioner to trade computation time for
accuracy. We show how to expand the variational approximating
class by incorporating additional covariance structure and by
introducing new components to form a mixture. We apply
variational boosting to synthetic and real statistical models,
and show that resulting posterior inferences compare favorably
to existing posterior approximation algorithms in both
accuracy and efficiency.
Miller, Andrew C.; Foti, Nicholas J.; d'Amour, Alexander; Adams, Ryan P.
Reducing Reparameterization Gradient Variance Conference
Advances in Neural Information Processing Systems (NIPS) 30, 2017, (arXiv:1705.07880 [stat.ML]).
@conference{miller2017reducing,
title = {Reducing Reparameterization Gradient Variance},
author = {Andrew C. Miller and Nicholas J. Foti and Alexander d'Amour and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/miller2017reducing.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Advances in Neural Information Processing Systems (NIPS) 30},
abstract = {Optimization with noisy gradients has become ubiquitous in
statistics and machine learning. Reparameterization gradients,
or gradient estimates computed via the "reparameterization
trick," represent a class of noisy gradients often used in
Monte Carlo variational inference (MCVI). However, when these
gradient estimators are too noisy, the optimization procedure
can be slow or fail to converge. One way to reduce noise is to
use more samples for the gradient estimate, but this can be
computationally expensive. Instead, we view the noisy gradient
as a random variable, and form an inexpensive approximation of
the generating procedure for the gradient sample. This
approximation has high correlation with the noisy gradient by
construction, making it a useful control variate for variance
reduction. We demonstrate our approach on non-conjugate
multi-level hierarchical models and a Bayesian neural net
where we observed gradient variance reductions of multiple
orders of magnitude (20-2,000x).},
note = {arXiv:1705.07880 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
statistics and machine learning. Reparameterization gradients,
or gradient estimates computed via the "reparameterization
trick," represent a class of noisy gradients often used in
Monte Carlo variational inference (MCVI). However, when these
gradient estimators are too noisy, the optimization procedure
can be slow or fail to converge. One way to reduce noise is to
use more samples for the gradient estimate, but this can be
computationally expensive. Instead, we view the noisy gradient
as a random variable, and form an inexpensive approximation of
the generating procedure for the gradient sample. This
approximation has high correlation with the noisy gradient by
construction, making it a useful control variate for variance
reduction. We demonstrate our approach on non-conjugate
multi-level hierarchical models and a Bayesian neural net
where we observed gradient variance reductions of multiple
orders of magnitude (20-2,000x).
Angelino, Elaine; Johnson, Matthew J.; Adams, Ryan P.
Patterns of Scalable Bayesian Inference Journal Article
In: Foundations and Trends in Machine Learning, vol. 9, no. 2-3, pp. 119–247, 2016, (arXiv:1602.05221 [stat.ML]).
@article{angelino2016patterns,
title = {Patterns of Scalable Bayesian Inference},
author = {Elaine Angelino and Matthew J. Johnson and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/angelino2016patterns.pdf},
year = {2016},
date = {2016-01-01},
journal = {Foundations and Trends in Machine Learning},
volume = {9},
number = {2-3},
pages = {119--247},
abstract = {Datasets are growing not just in size but in complexity,
creating a demand for rich models and quantification of
uncertainty. Bayesian methods are an excellent fit for this
demand, but scaling Bayesian inference is a challenge. In
response to this challenge, there has been considerable recent
work based on varying assumptions about model structure,
underlying computational resources, and the importance of
asymptotic correctness. As a result, there is a zoo of ideas
with few clear overarching principles. In this paper, we seek
to identify unifying principles, patterns, and intuitions for
scaling Bayesian inference. We review existing work on
utilizing modern computing resources with both MCMC and
variational approximation techniques. From this taxonomy of
ideas, we characterize the general principles that have proven
successful for designing scalable inference procedures and
comment on the path forward.},
note = {arXiv:1602.05221 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
creating a demand for rich models and quantification of
uncertainty. Bayesian methods are an excellent fit for this
demand, but scaling Bayesian inference is a challenge. In
response to this challenge, there has been considerable recent
work based on varying assumptions about model structure,
underlying computational resources, and the importance of
asymptotic correctness. As a result, there is a zoo of ideas
with few clear overarching principles. In this paper, we seek
to identify unifying principles, patterns, and intuitions for
scaling Bayesian inference. We review existing work on
utilizing modern computing resources with both MCMC and
variational approximation techniques. From this taxonomy of
ideas, we characterize the general principles that have proven
successful for designing scalable inference procedures and
comment on the path forward.
Duvenaud, David; Maclaurin, Dougal; Adams, Ryan P.
Early Stopping is Nonparametric Variational Inference Conference
Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016, (arXiv:1504.01344 [stat.ML]).
@conference{duvenaud2016early,
title = {Early Stopping is Nonparametric Variational Inference},
author = {David Duvenaud and Dougal Maclaurin and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/duvenaud2016early.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)},
abstract = {We show that unconverged stochastic gradient descent can be
interpreted as a procedure that samples from a nonparametric
variational approximate posterior distribution. This
distribution is implicitly defined as the transformation of an
initial distribution by a sequence of optimization updates. By
tracking the change in entropy over this sequence of
transformations during optimization, we form a scalable,
unbiased estimate of the variational lower bound on the log
marginal likelihood. We can use this bound to optimize
hyperparameters instead of using cross-validation. This
Bayesian interpretation of SGD suggests improved,
overfitting-resistant optimization procedures, and gives a
theoretical foundation for popular tricks such as early
stopping and ensembling. We investigate the properties of this
marginal likelihood estimator on neural network models.},
note = {arXiv:1504.01344 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
interpreted as a procedure that samples from a nonparametric
variational approximate posterior distribution. This
distribution is implicitly defined as the transformation of an
initial distribution by a sequence of optimization updates. By
tracking the change in entropy over this sequence of
transformations during optimization, we form a scalable,
unbiased estimate of the variational lower bound on the log
marginal likelihood. We can use this bound to optimize
hyperparameters instead of using cross-validation. This
Bayesian interpretation of SGD suggests improved,
overfitting-resistant optimization procedures, and gives a
theoretical foundation for popular tricks such as early
stopping and ensembling. We investigate the properties of this
marginal likelihood estimator on neural network models.
Johnson, Matthew J.; Duvenaud, David; Wiltschko, Alexander B.; Datta, Sandeep Robert; Adams, Ryan P.
Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference Conference
Advances in Neural Information Processing Systems (NIPS) 29, 2016, (arXiv:1603.06277 [stat.ML]).
@conference{johnson2016svae,
title = {Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference},
author = {Matthew J. Johnson and David Duvenaud and Alexander B. Wiltschko and Sandeep Robert Datta and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/johnson2016svae.pdf},
year = {2016},
date = {2016-01-01},
booktitle = {Advances in Neural Information Processing Systems (NIPS) 29},
abstract = {We propose a general modeling and inference framework that
composes probabilistic graphical models with deep learning
methods and combines their respective strengths. Our model
family augments graphical structure in latent variables with
neural network observation models. For inference, we extend
variational autoencoders to use graphical model approximating
distributions with recognition networks that output conjugate
potentials. All components of these models are learned
simultaneously with a single objective, giving a scalable
algorithm that leverages stochastic variational inference,
natural gradients, graphical model message passing, and the
reparameterization trick. We illustrate this framework with
several example models and an application to mouse behavioral
phenotyping.},
note = {arXiv:1603.06277 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
composes probabilistic graphical models with deep learning
methods and combines their respective strengths. Our model
family augments graphical structure in latent variables with
neural network observation models. For inference, we extend
variational autoencoders to use graphical model approximating
distributions with recognition networks that output conjugate
potentials. All components of these models are learned
simultaneously with a single objective, giving a scalable
algorithm that leverages stochastic variational inference,
natural gradients, graphical model message passing, and the
reparameterization trick. We illustrate this framework with
several example models and an application to mouse behavioral
phenotyping.
Grosse, Roger B.; Ghahramani, Zoubin; Adams, Ryan P.
Sandwiching the Marginal Likelihood Using Bidirectional Monte Carlo Unpublished
2016, (arXiv:1511.02543 [stat.ML]).
@unpublished{grosse2015sandwiching,
title = {Sandwiching the Marginal Likelihood Using Bidirectional Monte Carlo},
author = {Roger B. Grosse and Zoubin Ghahramani and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/grosse2015sandwiching.pdf},
year = {2016},
date = {2016-01-01},
abstract = {Computing the marginal likelihood (ML) of a model requires
marginalizing out all of the parameters and latent variables,
a difficult high-dimensional summation or integration
problem. To make matters worse, it is often hard to measure
the accuracy of one’s ML estimates. We present bidirectional
Monte Carlo, a technique for obtaining accurate log-ML
estimates on data simulated from a model. This method obtains
stochastic lower bounds on the log-ML using annealed
importance sampling or sequential Monte Carlo, and obtains
stochastic upper bounds by running these same algorithms in
reverse starting from an exact posterior sample. The true
value can be sandwiched between these two stochastic bounds
with high probability. Using the ground truth log-ML estimates
obtained from our method, we quantitatively evaluate a wide
variety of existing ML estimators on several latent variable
models: clustering, a low rank approximation, and a binary
attributes model. These experiments yield insights into how to
accurately estimate marginal likelihoods.},
note = {arXiv:1511.02543 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
marginalizing out all of the parameters and latent variables,
a difficult high-dimensional summation or integration
problem. To make matters worse, it is often hard to measure
the accuracy of one’s ML estimates. We present bidirectional
Monte Carlo, a technique for obtaining accurate log-ML
estimates on data simulated from a model. This method obtains
stochastic lower bounds on the log-ML using annealed
importance sampling or sequential Monte Carlo, and obtains
stochastic upper bounds by running these same algorithms in
reverse starting from an exact posterior sample. The true
value can be sandwiched between these two stochastic bounds
with high probability. Using the ground truth log-ML estimates
obtained from our method, we quantitatively evaluate a wide
variety of existing ML estimators on several latent variable
models: clustering, a low rank approximation, and a binary
attributes model. These experiments yield insights into how to
accurately estimate marginal likelihoods.
Rao, Vinayak; Adams, Ryan P.; Dunson, David B.
Bayesian Inference for Matérn Repulsive Processes Journal Article
In: Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 79, no. 3, pp. 877–897, 2016, (arXiv:1308.1136 [stat.ME]).
@article{rao2016matern,
title = {Bayesian Inference for Matérn Repulsive Processes},
author = {Vinayak Rao and Ryan P. Adams and David B. Dunson},
url = {http://www.cs.princeton.edu/~rpa/pubs/rao2016matern.pdf},
year = {2016},
date = {2016-01-01},
journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)},
volume = {79},
number = {3},
pages = {877--897},
abstract = {In many applications involving point pattern data, the Poisson
process assumption is unrealistic, with the data exhibiting a
more regular spread. Such a repulsion between events is
exhibited by trees for example, because of competition for
light and nutrients. Other examples include the locations of
biological cells and cities, and the times of neuronal
spikes. Given the many applications of repulsive point
processes, there is a surprisingly limited literature
developing flexible, realistic and interpretable models, as
well as efficient inferential methods. We address this gap by
developing a modelling framework around the Matérn type-III
repulsive process. We consider a number of extensions of the
original Matérn type-III process for both the homogeneous
and inhomogeneous cases. We also derive the probability
density of this generalized Matérn process. This allows us
to characterize the posterior distribution of the various
latent variables, and leads to a novel and efficient Markov
chain Monte Carlo algorithm. We apply our ideas to datasets
involving the spatial locations of trees, nerve fiber cells
and Greyhound bus stations.},
note = {arXiv:1308.1136 [stat.ME]},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
process assumption is unrealistic, with the data exhibiting a
more regular spread. Such a repulsion between events is
exhibited by trees for example, because of competition for
light and nutrients. Other examples include the locations of
biological cells and cities, and the times of neuronal
spikes. Given the many applications of repulsive point
processes, there is a surprisingly limited literature
developing flexible, realistic and interpretable models, as
well as efficient inferential methods. We address this gap by
developing a modelling framework around the Matérn type-III
repulsive process. We consider a number of extensions of the
original Matérn type-III process for both the homogeneous
and inhomogeneous cases. We also derive the probability
density of this generalized Matérn process. This allows us
to characterize the posterior distribution of the various
latent variables, and leads to a novel and efficient Markov
chain Monte Carlo algorithm. We apply our ideas to datasets
involving the spatial locations of trees, nerve fiber cells
and Greyhound bus stations.
Regier, Jeffrey; Miller, Andrew C.; McAuliffe, Jon; Adams, Ryan P.; Hoffman, Matthew D.; Lang, Dustin; Schlegel, David; Prabhat,
Celeste: Variational Inference for a Generative Model of Astronomical Images Conference
Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015, (arXiv:1506.01351 [astro-ph.IM]).
@conference{regier2015celeste,
title = {Celeste: Variational Inference for a Generative Model of Astronomical Images},
author = {Jeffrey Regier and Andrew C. Miller and Jon McAuliffe and Ryan P. Adams and Matthew D. Hoffman and Dustin Lang and David Schlegel and Prabhat},
url = {http://www.cs.princeton.edu/~rpa/pubs/regier2015celeste.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Proceedings of the 32nd International Conference on Machine Learning (ICML)},
abstract = {We present a new, fully generative model of optical telescope
image sets, along with a variational procedure for
inference. Each pixel intensity is treated as a Poisson random
variable, with a rate parameter dependent on latent properties
of stars and galaxies. Key latent properties are themselves
random, with scientific prior distributions constructed from
large ancillary data sets. We check our approach on synthetic
images. We also run it on images from a major sky survey,
where it exceeds the performance of the current
state-of-the-art method for locating celestial bodies and
measuring their colors.},
note = {arXiv:1506.01351 [astro-ph.IM]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
image sets, along with a variational procedure for
inference. Each pixel intensity is treated as a Poisson random
variable, with a rate parameter dependent on latent properties
of stars and galaxies. Key latent properties are themselves
random, with scientific prior distributions constructed from
large ancillary data sets. We check our approach on synthetic
images. We also run it on images from a major sky survey,
where it exceeds the performance of the current
state-of-the-art method for locating celestial bodies and
measuring their colors.
Linderman, Scott W.; Adams, Ryan P.
Scalable Bayesian Inference for Excitatory Point Process Networks Unpublished
2015, (arXiv:1507.03228 [stat.ML]).
@unpublished{linderman2015scalable,
title = {Scalable Bayesian Inference for Excitatory Point Process Networks},
author = {Scott W. Linderman and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/linderman2015scalable.pdf},
year = {2015},
date = {2015-01-01},
abstract = {Networks capture our intuition about relationships in the
world. They describe the friendships between Facebook users,
interactions in financial markets, and synapses connecting
neurons in the brain. These networks are richly structured
with cliques of friends, sectors of stocks, and a smorgasbord
of cell types that govern how neurons connect. Some networks,
like social network friendships, can be directly observed, but
in many cases we only have an indirect view of the network
through the actions of its constituents and an understanding
of how the network mediates that activity. In this work, we
focus on the problem of latent network discovery in the case
where the observable activity takes the form of a
mutually-excitatory point process known as a Hawkes
process. We build on previous work that has taken a Bayesian
approach to this problem, specifying prior distributions over
the latent network structure and a likelihood of observed
activity given this network. We extend this work by proposing
a discrete-time formulation and developing a computationally
efficient stochastic variational inference (SVI) algorithm
that allows us to scale the approach to long sequences of
observations. We demonstrate our algorithm on the calcium
imaging data used in the Chalearn neural connectomics
challenge.},
note = {arXiv:1507.03228 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
world. They describe the friendships between Facebook users,
interactions in financial markets, and synapses connecting
neurons in the brain. These networks are richly structured
with cliques of friends, sectors of stocks, and a smorgasbord
of cell types that govern how neurons connect. Some networks,
like social network friendships, can be directly observed, but
in many cases we only have an indirect view of the network
through the actions of its constituents and an understanding
of how the network mediates that activity. In this work, we
focus on the problem of latent network discovery in the case
where the observable activity takes the form of a
mutually-excitatory point process known as a Hawkes
process. We build on previous work that has taken a Bayesian
approach to this problem, specifying prior distributions over
the latent network structure and a likelihood of observed
activity given this network. We extend this work by proposing
a discrete-time formulation and developing a computationally
efficient stochastic variational inference (SVI) algorithm
that allows us to scale the approach to long sequences of
observations. We demonstrate our algorithm on the calcium
imaging data used in the Chalearn neural connectomics
challenge.
Linderman, Scott W.; Johnson, Matthew J.; Adams, Ryan P.
Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation Conference
Advances in Neural Information Processing Systems (NIPS) 28, 2015, (arXiv:1506.05843 [stat.ML]).
@conference{linderman2015multinomial,
title = {Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation},
author = {Scott W. Linderman and Matthew J. Johnson and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/linderman2015multinomial.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Advances in Neural Information Processing Systems (NIPS) 28},
abstract = {Many practical modeling problems involve discrete data that are
best represented as draws from multinomial or categorical
distributions. For example, nucleotides in a DNA sequence,
children's names in a given state and year, and text documents
are all commonly modeled with multinomial distributions. In
all of these cases, we expect some form of dependency between
the draws: the nucleotide at one position in the DNA strand
may depend on the preceding nucleotides, children's names are
highly correlated from year to year, and topics in text may be
correlated and dynamic. These dependencies are not naturally
captured by the typical Dirichlet-multinomial
formulation. Here, we leverage a logistic stick-breaking
representation and recent innovations in Polya-gamma
augmentation to reformulate the multinomial distribution in
terms of latent variables with jointly Gaussian likelihoods,
enabling us to take advantage of a host of Bayesian inference
techniques for Gaussian models with minimal overhead.},
note = {arXiv:1506.05843 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
best represented as draws from multinomial or categorical
distributions. For example, nucleotides in a DNA sequence,
children's names in a given state and year, and text documents
are all commonly modeled with multinomial distributions. In
all of these cases, we expect some form of dependency between
the draws: the nucleotide at one position in the DNA strand
may depend on the preceding nucleotides, children's names are
highly correlated from year to year, and topics in text may be
correlated and dynamic. These dependencies are not naturally
captured by the typical Dirichlet-multinomial
formulation. Here, we leverage a logistic stick-breaking
representation and recent innovations in Polya-gamma
augmentation to reformulate the multinomial distribution in
terms of latent variables with jointly Gaussian likelihoods,
enabling us to take advantage of a host of Bayesian inference
techniques for Gaussian models with minimal overhead.
Nishihara, Robert; Murray, Iain; Adams, Ryan P.
Parallel MCMC with Generalized Elliptical Slice Sampling Journal Article
In: Journal of Machine Learning Research, vol. 15, no. 1, pp. 2087-2112, 2014.
@article{nishihara2014generalized,
title = {Parallel MCMC with Generalized Elliptical Slice Sampling},
author = {Robert Nishihara and Iain Murray and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/nishihara2014generalized.pdf},
year = {2014},
date = {2014-01-01},
journal = {Journal of Machine Learning Research},
volume = {15},
number = {1},
pages = {2087-2112},
abstract = {Probabilistic models are conceptually powerful tools for finding
structure in data, but their practical effectiveness is often
limited by our ability to perform inference in them. Exact
inference is frequently intractable, so approximate inference
is often performed using Markov chain Monte Carlo (MCMC). To
achieve the best possible results from MCMC, we want to
efficiently simulate many steps of a rapidly mixing Markov
chain which leaves the target distribution invariant. Of
particular interest in this regard is how to take advantage of
multi-core computing to speed up MCMC-based inference, both to
improve mixing and to distribute the computational load. In
this paper, we present a parallelizable Markov chain Monte
Carlo algorithm for efficiently sampling from continuous
probability distributions that can take advantage of hundreds
of cores. This method shares information between parallel
Markov chains to build a scale-location mixture of Gaussians
approximation to the density function of the target
distribution. We combine this approximation with a recently
developed method known as elliptical slice sampling to create
a Markov chain with no step-size parameters that can mix
rapidly without requiring gradient or curvature computations.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
structure in data, but their practical effectiveness is often
limited by our ability to perform inference in them. Exact
inference is frequently intractable, so approximate inference
is often performed using Markov chain Monte Carlo (MCMC). To
achieve the best possible results from MCMC, we want to
efficiently simulate many steps of a rapidly mixing Markov
chain which leaves the target distribution invariant. Of
particular interest in this regard is how to take advantage of
multi-core computing to speed up MCMC-based inference, both to
improve mixing and to distribute the computational load. In
this paper, we present a parallelizable Markov chain Monte
Carlo algorithm for efficiently sampling from continuous
probability distributions that can take advantage of hundreds
of cores. This method shares information between parallel
Markov chains to build a scale-location mixture of Gaussians
approximation to the density function of the target
distribution. We combine this approximation with a recently
developed method known as elliptical slice sampling to create
a Markov chain with no step-size parameters that can mix
rapidly without requiring gradient or curvature computations.
Affandi, Raja Hafiz; Fox, Emily B.; Adams, Ryan P.; Taskar, Ben
Learning the Parameters of Determinantal Point Process Kernels Conference
Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.
@conference{affandi2014determinantal,
title = {Learning the Parameters of Determinantal Point Process Kernels},
author = {Raja Hafiz Affandi and Emily B. Fox and Ryan P. Adams and Ben Taskar},
url = {http://www.cs.princeton.edu/~rpa/pubs/affandi2014determinantal.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 31st International Conference on Machine Learning (ICML)},
abstract = {Determinantal point processes (DPPs) are well-suited for
modeling repulsion and have proven useful in many applications
where diversity is desired. While DPPs have many appealing
properties, such as efficient sampling, learning the
parameters of a DPP is still considered a difficult problem
due to the non-convex nature of the likelihood function. In
this paper, we propose using Bayesian methods to learn the DPP
kernel parameters. These methods are applicable in large-scale
and continuous DPP settings even when the exact form of the
eigendecomposition is unknown. We demonstrate the utility of
our DPP learning methods in studying the progression of
diabetic neuropathy based on spatial distribution of nerve
fibers, and in studying human perception of diversity in
images.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
modeling repulsion and have proven useful in many applications
where diversity is desired. While DPPs have many appealing
properties, such as efficient sampling, learning the
parameters of a DPP is still considered a difficult problem
due to the non-convex nature of the likelihood function. In
this paper, we propose using Bayesian methods to learn the DPP
kernel parameters. These methods are applicable in large-scale
and continuous DPP settings even when the exact form of the
eigendecomposition is unknown. We demonstrate the utility of
our DPP learning methods in studying the progression of
diabetic neuropathy based on spatial distribution of nerve
fibers, and in studying human perception of diversity in
images.
Maclaurin, Dougal; Adams, Ryan P.
Firefly Monte Carlo: Exact MCMC with Subsets of Data Conference
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014, (arXiv:1403.5693 [stat.ML]).
@conference{maclaurin2014firefly,
title = {Firefly Monte Carlo: Exact MCMC with Subsets of Data},
author = {Dougal Maclaurin and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/maclaurin2014firefly.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI)},
abstract = {Markov chain Monte Carlo (MCMC) is a popular and successful
general-purpose tool for Bayesian inference. However, MCMC
cannot be practically applied to large data sets because of
the prohibitive cost of evaluating every likelihood term at
every iteration. Here we present Fire- fly Monte Carlo (FlyMC)
an auxiliary variable MCMC algorithm that only queries the
likelihoods of a potentially small subset of the data at each
iteration yet simulates from the exact posterior distribution,
in contrast to recent proposals that are approximate even in
the asymptotic limit. FlyMC is compatible with a wide variety
of modern MCMC algorithms, and only requires a lower bound on
the per-datum likelihood factors. In experiments, we find that
FlyMC generates samples from the posterior more than an order
of magnitude faster than regular MCMC, opening up MCMC methods
to larger datasets than were previously considered feasible.},
note = {arXiv:1403.5693 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
general-purpose tool for Bayesian inference. However, MCMC
cannot be practically applied to large data sets because of
the prohibitive cost of evaluating every likelihood term at
every iteration. Here we present Fire- fly Monte Carlo (FlyMC)
an auxiliary variable MCMC algorithm that only queries the
likelihoods of a potentially small subset of the data at each
iteration yet simulates from the exact posterior distribution,
in contrast to recent proposals that are approximate even in
the asymptotic limit. FlyMC is compatible with a wide variety
of modern MCMC algorithms, and only requires a lower bound on
the per-datum likelihood factors. In experiments, we find that
FlyMC generates samples from the posterior more than an order
of magnitude faster than regular MCMC, opening up MCMC methods
to larger datasets than were previously considered feasible.
Angelino, Elaine; Kohler, Eddie; Waterland, Amos; Seltzer, Margo; Adams, Ryan P.
Accelerating MCMC via Parallel Predictive Prefetching Conference
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014, (arXiv:1403.7265 [stat.ML]).
@conference{angelino2014accelerating,
title = {Accelerating MCMC via Parallel Predictive Prefetching},
author = {Elaine Angelino and Eddie Kohler and Amos Waterland and Margo Seltzer and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/angelino2014accelerating.pdf},
year = {2014},
date = {2014-01-01},
booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI)},
abstract = {Parallel predictive prefetching is a new frame- work for
accelerating a large class of widely-used Markov chain Monte
Carlo (MCMC) algorithms. It speculatively evaluates many
potential steps of an MCMC chain in parallel while exploiting
fast, iterative approximations to the tar- get density. This
can accelerate sampling from target distributions in Bayesian
inference problems. Our approach takes advantage of whatever
parallel resources are available, but produces results exactly
equivalent to standard serial execution. In the initial
burn-in phase of chain evaluation, we achieve speedup close to
linear in the number of available cores.},
note = {arXiv:1403.7265 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
accelerating a large class of widely-used Markov chain Monte
Carlo (MCMC) algorithms. It speculatively evaluates many
potential steps of an MCMC chain in parallel while exploiting
fast, iterative approximations to the tar- get density. This
can accelerate sampling from target distributions in Bayesian
inference problems. Our approach takes advantage of whatever
parallel resources are available, but produces results exactly
equivalent to standard serial execution. In the initial
burn-in phase of chain evaluation, we achieve speedup close to
linear in the number of available cores.
Lovell, Dan; Malmaud, Jonathan; Adams, Ryan P.; Mansinghka, Vikash K.
ClusterCluster: Parallel Markov Chain Monte Carlo for Đirichlet Process Mixtures Unpublished
2013, (arXiv:1304.2302 [stat.ML]).
@unpublished{lovell2013cluster,
title = {ClusterCluster: Parallel Markov Chain Monte Carlo for Đirichlet Process Mixtures},
author = {Dan Lovell and Jonathan Malmaud and Ryan P. Adams and Vikash K. Mansinghka},
url = {http://www.cs.princeton.edu/~rpa/pubs/lovell2013cluster.pdf},
year = {2013},
date = {2013-01-01},
abstract = {The Dirichlet process (DP) is a fundamental mathematical tool
for Bayesian nonparametric modeling, and is widely used in
tasks such as density estimation, natural language processing,
and time series modeling. Although MCMC inference methods for
the DP often provide a gold standard in terms asymptotic
accuracy, they can be computationally expensive and are not
obviously parallelizable. We propose a reparameterization of
the Dirichlet process that induces conditional independencies
between the atoms that form the random measure. This
conditional independence enables many of the Markov chain
transition operators for DP inference to be simulated in
parallel across multiple cores. Applied to mixture modeling,
our approach enables the Dirichlet process to simultaneously
learn clusters that describe the data and superclusters that
define the granularity of parallelization. Unlike previous
approaches, our technique does not require alteration of the
model and leaves the true posterior distribution invariant. It
also naturally lends itself to a distributed software
implementation in terms of Map-Reduce, which we test in
cluster configurations of over 50 machines and 100 cores. We
present experiments exploring the parallel efficiency and
convergence properties of our approach on both synthetic and
real-world data, including runs on 1MM data vectors in 256
dimensions.},
note = {arXiv:1304.2302 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
for Bayesian nonparametric modeling, and is widely used in
tasks such as density estimation, natural language processing,
and time series modeling. Although MCMC inference methods for
the DP often provide a gold standard in terms asymptotic
accuracy, they can be computationally expensive and are not
obviously parallelizable. We propose a reparameterization of
the Dirichlet process that induces conditional independencies
between the atoms that form the random measure. This
conditional independence enables many of the Markov chain
transition operators for DP inference to be simulated in
parallel across multiple cores. Applied to mixture modeling,
our approach enables the Dirichlet process to simultaneously
learn clusters that describe the data and superclusters that
define the granularity of parallelization. Unlike previous
approaches, our technique does not require alteration of the
model and leaves the true posterior distribution invariant. It
also naturally lends itself to a distributed software
implementation in terms of Map-Reduce, which we test in
cluster configurations of over 50 machines and 100 cores. We
present experiments exploring the parallel efficiency and
convergence properties of our approach on both synthetic and
real-world data, including runs on 1MM data vectors in 256
dimensions.
Murray, Iain; Adams, Ryan P.; MacKay, David J. C.
Elliptical Slice Sampling Conference
Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, (arXiv:1001.0175 [stat.CO]).
@conference{murray2010elliptical,
title = {Elliptical Slice Sampling},
author = {Iain Murray and Ryan P. Adams and David J.C. MacKay},
url = {http://www.cs.princeton.edu/~rpa/pubs/murray2010elliptical.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 13th International Conference on Artificial
Intelligence and Statistics (AISTATS)},
abstract = {Many probabilistic models introduce strong dependencies
between variables using a latent multivariate Gaussian
distribution or a Gaussian process. We present a new Markov
chain Monte Carlo algorithm for performing inference in models
with multivariate Gaussian priors. Its key properties are: 1)
it has simple, generic code applicable to many models, 2) it
has no free parameters, 3) it works well for a variety of
Gaussian process based models. These properties make our
method ideal for use while model building, removing the need
to spend time deriving and tuning updates for more complex
algorithms.},
note = {arXiv:1001.0175 [stat.CO]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
between variables using a latent multivariate Gaussian
distribution or a Gaussian process. We present a new Markov
chain Monte Carlo algorithm for performing inference in models
with multivariate Gaussian priors. Its key properties are: 1)
it has simple, generic code applicable to many models, 2) it
has no free parameters, 3) it works well for a variety of
Gaussian process based models. These properties make our
method ideal for use while model building, removing the need
to spend time deriving and tuning updates for more complex
algorithms.
Adams, Ryan P.; Wallach, Hanna M.; Ghahramani, Zoubin
Learning the Structure of Deep Sparse Graphical Models Conference
Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, (arXiv:1001.0160 [stat.ML]).
@conference{adams2010deep,
title = {Learning the Structure of Deep Sparse Graphical Models},
author = {Ryan P. Adams and Hanna M. Wallach and Zoubin Ghahramani},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2010deep.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Proceedings of the 13th International Conference on Artificial
Intelligence and Statistics (AISTATS)},
abstract = {Deep belief networks are a powerful way to model complex
probability distributions. However, learning the structure of
a belief network, particularly one with hidden units, is
difficult. The Indian buffet process has been used as a
nonparametric Bayesian prior on the directed structure of a
belief network with a single infinitely wide hidden layer. In
this paper, we introduce the cascading Indian buffet process
(CIBP), which provides a nonparametric prior on the structure
of a layered, directed belief network that is unbounded in
both depth and width, yet allows tractable inference. We use
the CIBP prior with the nonlinear Gaussian belief network so
each unit can additionally vary its behavior between discrete
and continuous representations. We provide Markov chain Monte
Carlo algorithms for inference in these belief networks and
explore the structures learned on several image data sets.},
note = {arXiv:1001.0160 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
probability distributions. However, learning the structure of
a belief network, particularly one with hidden units, is
difficult. The Indian buffet process has been used as a
nonparametric Bayesian prior on the directed structure of a
belief network with a single infinitely wide hidden layer. In
this paper, we introduce the cascading Indian buffet process
(CIBP), which provides a nonparametric prior on the structure
of a layered, directed belief network that is unbounded in
both depth and width, yet allows tractable inference. We use
the CIBP prior with the nonlinear Gaussian belief network so
each unit can additionally vary its behavior between discrete
and continuous representations. We provide Markov chain Monte
Carlo algorithms for inference in these belief networks and
explore the structures learned on several image data sets.
Murray, Iain; Adams, Ryan P.
Slice Sampling Covariance Hyperparameters in Latent Gaussian Models Conference
Advances in Neural Information Processing Systems (NIPS) 23, 2010, (arXiv:1006.0868 [stat.CO]).
@conference{murray2010hyper,
title = {Slice Sampling Covariance Hyperparameters in Latent
Gaussian Models},
author = {Iain Murray and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/murray2010hyper.pdf},
year = {2010},
date = {2010-01-01},
booktitle = {Advances in Neural Information Processing Systems (NIPS) 23},
abstract = {The Gaussian process (GP) is a popular way to specify
dependencies between random variables in a probabilistic
model. In the Bayesian framework the covariance structure can
be specified using unknown hyperparameters. Integrating over
these hyperparameters considers different possible
explanations for the data when making predictions. This
integration is often performed using Markov chain Monte Carlo
(MCMC) sampling. However, with non-Gaussian observations
standard hyperparameter sampling approaches require careful
tuning and may converge slowly. In this paper we present a
slice sampling approach that requires little tuning while
mixing well in both strong- and weak-data regimes.},
note = {arXiv:1006.0868 [stat.CO]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
dependencies between random variables in a probabilistic
model. In the Bayesian framework the covariance structure can
be specified using unknown hyperparameters. Integrating over
these hyperparameters considers different possible
explanations for the data when making predictions. This
integration is often performed using Markov chain Monte Carlo
(MCMC) sampling. However, with non-Gaussian observations
standard hyperparameter sampling approaches require careful
tuning and may converge slowly. In this paper we present a
slice sampling approach that requires little tuning while
mixing well in both strong- and weak-data regimes.
Adams, Ryan P.; Ghahramani, Zoubin
Archipelago: Nonparametric Bayesian Semi-Supervised Learning Conference
Proceedings of the 26th International Conference on Machine Learning (ICML), 2009.
@conference{adams2009archipelago,
title = {Archipelago: Nonparametric Bayesian Semi-Supervised
Learning},
author = {Ryan P. Adams and Zoubin Ghahramani},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009archipelago.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 26th International Conference on
Machine Learning (ICML)},
abstract = {Semi-supervised learning (SSL), is classification where
additional unlabeled data can be used to improve
accuracy. Generative approaches are appealing in this
situation, as a model of the data's probability density can
assist in identifying clusters. Nonparametric Bayesian
methods, while ideal in theory due to their principled
motivations, have been difficult to apply to SSL in
practice. We present a nonparametric Bayesian method that uses
Gaussian processes for the generative model, avoiding many of
the problems associated with Dirichlet process mixture
models. Our model is fully generative and we take advantage of
recent advances in Markov chain Monte Carlo algorithms to
provide a practical inference method. Our method compares
favorably to competing approaches on synthetic and real-world
multi-class data.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
additional unlabeled data can be used to improve
accuracy. Generative approaches are appealing in this
situation, as a model of the data's probability density can
assist in identifying clusters. Nonparametric Bayesian
methods, while ideal in theory due to their principled
motivations, have been difficult to apply to SSL in
practice. We present a nonparametric Bayesian method that uses
Gaussian processes for the generative model, avoiding many of
the problems associated with Dirichlet process mixture
models. Our model is fully generative and we take advantage of
recent advances in Markov chain Monte Carlo algorithms to
provide a practical inference method. Our method compares
favorably to competing approaches on synthetic and real-world
multi-class data.
Adams, Ryan P.; Murray, Iain; MacKay, David J. C.
The Gaussian Process Density Sampler Conference
Advances in Neural Information Processing Systems 21 (NIPS), 2009.
@conference{adams2009gpds,
title = {The Gaussian Process Density Sampler},
author = {Ryan P. Adams and Iain Murray and David J.C. MacKay},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009gpds.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Advances in Neural Information Processing Systems 21 (NIPS)},
abstract = {We present the Gaussian Process Density Sampler (GPDS), an
exchangeable generative model for use in nonparametric
Bayesian density estimation. Samples drawn from the GPDS are
consistent with exact, independent samples from a fixed
density function that is a transformation of a function drawn
from a Gaussian process prior. Our formulation allows us to
infer an unknown density from data using Markov chain Monte
Carlo, which gives samples from the posterior distribution
over density functions and from the predictive distribution on
data space. We can also infer the hyperparameters of the
Gaussian process. We compare this density modeling technique
to several existing techniques on a toy problem and a
skull-reconstruction task.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
exchangeable generative model for use in nonparametric
Bayesian density estimation. Samples drawn from the GPDS are
consistent with exact, independent samples from a fixed
density function that is a transformation of a function drawn
from a Gaussian process prior. Our formulation allows us to
infer an unknown density from data using Markov chain Monte
Carlo, which gives samples from the posterior distribution
over density functions and from the predictive distribution on
data space. We can also infer the hyperparameters of the
Gaussian process. We compare this density modeling technique
to several existing techniques on a toy problem and a
skull-reconstruction task.
Adams, Ryan P.; Murray, Iain; MacKay, David J. C.
Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Conference
Proceedings of the 26th International Conference on Machine Learning (ICML), Montréal, Canada, 2009.
@conference{adams2009poisson,
title = {Tractable Nonparametric Bayesian Inference in Poisson
Processes with Gaussian Process Intensities},
author = {Ryan P. Adams and Iain Murray and David J.C. MacKay},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009gpds.pdf},
year = {2009},
date = {2009-01-01},
booktitle = {Proceedings of the 26th International Conference on
Machine Learning (ICML)},
address = {Montréal, Canada},
abstract = {The inhomogeneous Poisson process is a point process that has
varying intensity across its domain (usually time or
space). For nonparametric Bayesian modeling, the Gaussian
process is a useful way to place a prior distribution on this
intensity. The combination of a Poisson process and GP is
known as a Gaussian Cox process, or doubly-stochastic Poisson
process. Likelihood-based inference in these models requires
an intractable integral over an infinite-dimensional random
function. In this paper we present the first approach to
Gaussian Cox processes in which it is possible to perform
inference without introducing approximations or
finite-dimensional proxy distributions. We call our method the
Sigmoidal Gaussian Cox Process, which uses a generative model
for Poisson data to enable tractable inference via Markov
chain Monte Carlo. We compare our methods to competing methods
on synthetic data and apply it to several real-world data
sets.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
varying intensity across its domain (usually time or
space). For nonparametric Bayesian modeling, the Gaussian
process is a useful way to place a prior distribution on this
intensity. The combination of a Poisson process and GP is
known as a Gaussian Cox process, or doubly-stochastic Poisson
process. Likelihood-based inference in these models requires
an intractable integral over an infinite-dimensional random
function. In this paper we present the first approach to
Gaussian Cox processes in which it is possible to perform
inference without introducing approximations or
finite-dimensional proxy distributions. We call our method the
Sigmoidal Gaussian Cox Process, which uses a generative model
for Poisson data to enable tractable inference via Markov
chain Monte Carlo. We compare our methods to competing methods
on synthetic data and apply it to several real-world data
sets.
Adams, Ryan P.; Murray, Iain; MacKay, David J. C.
Nonparametric Bayesian Density Modeling with Gaussian Processes Unpublished
2009, (arXiv:0912.4896 [stat.CO]).
@unpublished{adams2009nonparametric,
title = {Nonparametric Bayesian Density Modeling with Gaussian Processes},
author = {Ryan P. Adams and Iain Murray and David J.C. MacKay},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009nonparametric.pdf},
year = {2009},
date = {2009-01-01},
abstract = {We present the Gaussian process density sampler (GPDS), an
exchangeable generative model for use in nonparametric
Bayesian density estimation. Samples drawn from the GPDS are
consistent with exact, independent samples from a distribution
defined by a density that is a transformation of a function
drawn from a Gaussian process prior. Our formulation allows us
to infer an unknown density from data using Markov chain Monte
Carlo, which gives samples from the posterior distribution
over density functions and from the predictive distribution on
data space. We describe two such MCMC methods. Both methods
also allow inference of the hyperparameters of the Gaussian
process.},
note = {arXiv:0912.4896 [stat.CO]},
keywords = {},
pubstate = {published},
tppubtype = {unpublished}
}
exchangeable generative model for use in nonparametric
Bayesian density estimation. Samples drawn from the GPDS are
consistent with exact, independent samples from a distribution
defined by a density that is a transformation of a function
drawn from a Gaussian process prior. Our formulation allows us
to infer an unknown density from data using Markov chain Monte
Carlo, which gives samples from the posterior distribution
over density functions and from the predictive distribution on
data space. We describe two such MCMC methods. Both methods
also allow inference of the hyperparameters of the Gaussian
process.
Adams, Ryan P.
Kernel Methods for Nonparametric Bayesian Inference of Probability Densities and Point Processes PhD Thesis
University of Cambridge, 2009.
@phdthesis{adams2009thesis,
title = {Kernel Methods for Nonparametric Bayesian Inference of
Probability Densities and Point Processes},
author = {Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009thesis.pdf},
year = {2009},
date = {2009-01-01},
address = {Cambridge, UK},
school = {University of Cambridge},
abstract = {Nonparametric kernel methods for estimation of probability
densities and point process intensities have long been of
interest to researchers in statistics and machine
learning. Frequentist kernel methods are widely used, but
provide only a point estimate of the unknown
density. Additionally, in frequentist kernel density methods,
it can be difficult to select appropriate kernel
parameters. The Bayesian approach to inference potentially
resolves both of these deficiencies, by providing a
distribution over the unknowns and enabling a principled
approach to kernel selection. Constructing a Bayesian
nonparametric kernel density method has proven to be
difficult, however, due to the need to integrate over an
infinite-dimensional random function in order to evaluate the
likelihood. To avoid this intractability, all Bayesian kernel
density methods to date have either used a crippled model or a
finite-dimensional approximation. Recent advances in Markov
chain Monte Carlo methods have improved the situation for
these doubly-intractable posterior distributions, however. If
data can be generated exactly from the model, then it is
possible to perform inference without computing the
intractable likelihood. I propose two new kernel-based models
that enable an exact generative procedure: the Gaussian
process density sampler (GPDS) for probability density
functions, and the sigmoidal Gaussian Cox process (SGCP) for
the Poisson process. With generative priors, I show how it is
now possible to construct two dif- ferent kinds of Markov
chains for inference in these models. These Markov chains have
the desired posterior distribution as their equilibrium
distributions, and, despite a parameter space with uncountably
many dimensions, require only a finite amount of computation
to simulate. The GPDS and SGCP, and the associated inference
procedures, are the first kernel-based nonparametric Bayesian
methods that allow inference without a finite-dimensional
approximation. I also present several additional kernel-based
models for data that extend the Gaussian process density
sampler and sigmoidal Gaussian Cox process to other
situations. The Archipelago model extends the GPDS to address
the task of semi-supervised learning, where a flexible density
estimate can improve the performance of a classifier when
unlabeled data are available. I also generalise the SGCP to
enable a nonparametric inhomogeneous Neyman–Scott
process, and present a soft-core generalisation of the Mate
Ìrn repulsive process that similarly allows
non-approximate inference via Markov chain Monte Carlo.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}
densities and point process intensities have long been of
interest to researchers in statistics and machine
learning. Frequentist kernel methods are widely used, but
provide only a point estimate of the unknown
density. Additionally, in frequentist kernel density methods,
it can be difficult to select appropriate kernel
parameters. The Bayesian approach to inference potentially
resolves both of these deficiencies, by providing a
distribution over the unknowns and enabling a principled
approach to kernel selection. Constructing a Bayesian
nonparametric kernel density method has proven to be
difficult, however, due to the need to integrate over an
infinite-dimensional random function in order to evaluate the
likelihood. To avoid this intractability, all Bayesian kernel
density methods to date have either used a crippled model or a
finite-dimensional approximation. Recent advances in Markov
chain Monte Carlo methods have improved the situation for
these doubly-intractable posterior distributions, however. If
data can be generated exactly from the model, then it is
possible to perform inference without computing the
intractable likelihood. I propose two new kernel-based models
that enable an exact generative procedure: the Gaussian
process density sampler (GPDS) for probability density
functions, and the sigmoidal Gaussian Cox process (SGCP) for
the Poisson process. With generative priors, I show how it is
now possible to construct two dif- ferent kinds of Markov
chains for inference in these models. These Markov chains have
the desired posterior distribution as their equilibrium
distributions, and, despite a parameter space with uncountably
many dimensions, require only a finite amount of computation
to simulate. The GPDS and SGCP, and the associated inference
procedures, are the first kernel-based nonparametric Bayesian
methods that allow inference without a finite-dimensional
approximation. I also present several additional kernel-based
models for data that extend the Gaussian process density
sampler and sigmoidal Gaussian Cox process to other
situations. The Archipelago model extends the GPDS to address
the task of semi-supervised learning, where a flexible density
estimate can improve the performance of a classifier when
unlabeled data are available. I also generalise the SGCP to
enable a nonparametric inhomogeneous Neyman–Scott
process, and present a soft-core generalisation of the Mate
Ìrn repulsive process that similarly allows
non-approximate inference via Markov chain Monte Carlo.
Adams, Ryan P.; Stegle, Oliver
Gaussian Process Product Models for Nonparametric Nonstationarity Conference
Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 2008.
@conference{adams2008gppm,
title = {Gaussian Process Product Models for Nonparametric
Nonstationarity},
author = {Ryan P. Adams and Oliver Stegle},
url = {http://www.cs.princeton.edu/~rpa/pubs/adams2008gppm.pdf},
year = {2008},
date = {2008-01-01},
booktitle = {Proceedings of the 25th International Conference on
Machine Learning (ICML)},
pages = {1-8},
address = {Helsinki, Finland},
abstract = {Stationarity is often an unrealistic prior assumption for
Gaussian process regression. One solution is to predefine an
explicit nonstationary covariance function, but such
covariance functions can be difficult to specify and require
detailed prior knowledge of the nonstationarity. We propose
the Gaussian process product model (GPPM) which models data as
the pointwise product of two latent Gaussian processes to
nonparametrically infer nonstationary variations of
amplitude. This approach differs from other nonparametric
approaches to covariance function inference in that it
operates on the outputs rather than the inputs, resulting in a
significant reduction in computational cost and required data
for inference. We present an approximate inference scheme
using Expectation Propagation. This variational approximation
yields convenient GP hyperparameter selection and compact
approximate predictive distributions.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Gaussian process regression. One solution is to predefine an
explicit nonstationary covariance function, but such
covariance functions can be difficult to specify and require
detailed prior knowledge of the nonstationarity. We propose
the Gaussian process product model (GPPM) which models data as
the pointwise product of two latent Gaussian processes to
nonparametrically infer nonstationary variations of
amplitude. This approach differs from other nonparametric
approaches to covariance function inference in that it
operates on the outputs rather than the inputs, resulting in a
significant reduction in computational cost and required data
for inference. We present an approximate inference scheme
using Expectation Propagation. This variational approximation
yields convenient GP hyperparameter selection and compact
approximate predictive distributions.