# Accelerating and Improving Approximate Bayesian Inference

Probabilistic models are a powerful way to reason about uncertainty in order to support prediction, decision making, and discovery. Probabilistic models are specified by coming up with a joint probability distribution that connects unknown parameters, unobserved latent variables, and observed data. We often describe this approach as "generative modeling" because it constructs a distribution over data from which we can sample. Not only does this let us "tell a story" about what latent factors might've contributed to the structure in the data, but it also lets us apply mathematical tools such as graph theory in the form of probabilistic graphical models.

The challenge of the approach, however, is that learning (fitting parameters) and inference (fitting latent variables) both correspond to manipulating a conditional distribution --- the Bayesian posterior distribution --- that may have annoying structure. To make predictions and decisions, we need to be able to take expectations under this distribution. The goal of "approximate inference" in this context is to develop computational algorithms that allow one to compute such expectations.

There are two main approaches to approximate inference: 1) drawing samples from the posterior using a technique such as Markov chain Monte Carlo, and 2) approximating the intractable posterior with a simpler family, i.e., variational inference. In the LIPS group, we study both approaches and also look for ways to apply these methods to real problems in, e.g., astronomy.

Zoltowski, David M.; Cai, Diana; Adams, Ryan P.

Slice sampling reparameterization gradients Conference

Advances in Neural Information Processing Systems 34 (NeurIPS), 2021.

@conference{zoltowski2021slice,

title = {Slice sampling reparameterization gradients},

author = {David M. Zoltowski and Diana Cai and Ryan P. Adams},

year = {2021},

date = {2021-12-01},

booktitle = {Advances in Neural Information Processing Systems 34 (NeurIPS)},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

Luo, Yucen; Beatson, Alex; Norouzi, Mohammad; Zhu, Jun; Duvenaud, David; Adams, Ryan P.; Chen, Ricky T. Q.

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models Conference

Proceedings of the Eighth International Conference on Learning Representations (ICLR), 2020.

@conference{luo2020sumo,

title = {SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models},

author = {Yucen Luo and

Alex Beatson and

Mohammad Norouzi and

Jun Zhu and

David Duvenaud and

Ryan P. Adams and

Ricky T. Q. Chen},

url = {https://openreview.net/forum?id=SylkYeHtwr},

year = {2020},

date = {2020-04-30},

booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR)},

abstract = {The standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

Beatson, Alex; Adams, Ryan P.

Efficient Optimization of Loops and Limits with Randomized Telescoping Sums Conference

Proceedings of the 36th International Conference on Machine Learning (ICML), 2019.

@conference{beatson2019efficient,

title = {Efficient Optimization of Loops and Limits with Randomized Telescoping Sums},

author = {Alex Beatson and

Ryan P. Adams},

url = {https://www.cs.princeton.edu/~rpa/pubs/beatson2019efficient.pdf},

year = {2019},

date = {2019-06-13},

booktitle = {Proceedings of the 36th International Conference on Machine Learning (ICML)},

abstract = {We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

Regier, Jeffrey; Miller, Andrew C.; Schlegel, David; Adams, Ryan P.; McAuliffe, Jon D.; Prabhat,

Approximate inference for constructing astronomical catalogs from images Journal Article

In: Annals of Applied Statistics, vol. 13, no. 3, pp. 1884-1926, 2019.

@article{regier2019approximate,

title = {Approximate inference for constructing astronomical catalogs from images},

author = {Jeffrey Regier and

Andrew C. Miller and

David Schlegel and

Ryan P. Adams and

Jon D. McAuliffe and

Prabhat},

url = {https://www.cs.princeton.edu/~rpa/pubs/regier2019approximate.pdf},

year = {2019},

date = {2019-03-01},

journal = {Annals of Applied Statistics},

volume = {13},

number = {3},

pages = {1884-1926},

abstract = {We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Carlo (MCMC) while the other is based on variational inference (VI). The MCMC procedure excels at quantifying uncertainty, while the VI procedure is 1000 times faster. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50 terabytes of images in 14.6 minutes, demonstrating the scaling characteristics necessary to construct catalogs for upcoming astronomical surveys.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Miller, Andrew C.; Foti, Nicholas J.; Adams, Ryan P.

Variational Boosting: Iteratively Refining Posterior Approximations Conference

Proceedings of the 34th International Conference on Machine Learning (ICML), 2017, (arXiv:1611.06585 [stat.ML]).

@conference{miller2017boosting,

title = {Variational Boosting: Iteratively Refining Posterior Approximations},

author = {Andrew C. Miller and Nicholas J. Foti and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/miller2017boosting.pdf},

year = {2017},

date = {2017-01-01},

booktitle = {Proceedings of the 34th International Conference on Machine Learning (ICML)},

abstract = {We propose a black-box variational inference method to

approximate intractable distributions with an increasingly

rich approximating class. Our method, termed variational

boosting, iteratively refines an existing variational

approximation by solving a sequence of optimization problems,

allowing the practitioner to trade computation time for

accuracy. We show how to expand the variational approximating

class by incorporating additional covariance structure and by

introducing new components to form a mixture. We apply

variational boosting to synthetic and real statistical models,

and show that resulting posterior inferences compare favorably

to existing posterior approximation algorithms in both

accuracy and efficiency.},

note = {arXiv:1611.06585 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

approximate intractable distributions with an increasingly

rich approximating class. Our method, termed variational

boosting, iteratively refines an existing variational

approximation by solving a sequence of optimization problems,

allowing the practitioner to trade computation time for

accuracy. We show how to expand the variational approximating

class by incorporating additional covariance structure and by

introducing new components to form a mixture. We apply

variational boosting to synthetic and real statistical models,

and show that resulting posterior inferences compare favorably

to existing posterior approximation algorithms in both

accuracy and efficiency.

Miller, Andrew C.; Foti, Nicholas J.; d'Amour, Alexander; Adams, Ryan P.

Reducing Reparameterization Gradient Variance Conference

Advances in Neural Information Processing Systems (NIPS) 30, 2017, (arXiv:1705.07880 [stat.ML]).

@conference{miller2017reducing,

title = {Reducing Reparameterization Gradient Variance},

author = {Andrew C. Miller and Nicholas J. Foti and Alexander d'Amour and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/miller2017reducing.pdf},

year = {2017},

date = {2017-01-01},

booktitle = {Advances in Neural Information Processing Systems (NIPS) 30},

abstract = {Optimization with noisy gradients has become ubiquitous in

statistics and machine learning. Reparameterization gradients,

or gradient estimates computed via the "reparameterization

trick," represent a class of noisy gradients often used in

Monte Carlo variational inference (MCVI). However, when these

gradient estimators are too noisy, the optimization procedure

can be slow or fail to converge. One way to reduce noise is to

use more samples for the gradient estimate, but this can be

computationally expensive. Instead, we view the noisy gradient

as a random variable, and form an inexpensive approximation of

the generating procedure for the gradient sample. This

approximation has high correlation with the noisy gradient by

construction, making it a useful control variate for variance

reduction. We demonstrate our approach on non-conjugate

multi-level hierarchical models and a Bayesian neural net

where we observed gradient variance reductions of multiple

orders of magnitude (20-2,000x).},

note = {arXiv:1705.07880 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

statistics and machine learning. Reparameterization gradients,

or gradient estimates computed via the "reparameterization

trick," represent a class of noisy gradients often used in

Monte Carlo variational inference (MCVI). However, when these

gradient estimators are too noisy, the optimization procedure

can be slow or fail to converge. One way to reduce noise is to

use more samples for the gradient estimate, but this can be

computationally expensive. Instead, we view the noisy gradient

as a random variable, and form an inexpensive approximation of

the generating procedure for the gradient sample. This

approximation has high correlation with the noisy gradient by

construction, making it a useful control variate for variance

reduction. We demonstrate our approach on non-conjugate

multi-level hierarchical models and a Bayesian neural net

where we observed gradient variance reductions of multiple

orders of magnitude (20-2,000x).

Angelino, Elaine; Johnson, Matthew J.; Adams, Ryan P.

Patterns of Scalable Bayesian Inference Journal Article

In: Foundations and Trends in Machine Learning, vol. 9, no. 2-3, pp. 119–247, 2016, (arXiv:1602.05221 [stat.ML]).

@article{angelino2016patterns,

title = {Patterns of Scalable Bayesian Inference},

author = {Elaine Angelino and Matthew J. Johnson and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/angelino2016patterns.pdf},

year = {2016},

date = {2016-01-01},

journal = {Foundations and Trends in Machine Learning},

volume = {9},

number = {2-3},

pages = {119--247},

abstract = {Datasets are growing not just in size but in complexity,

creating a demand for rich models and quantification of

uncertainty. Bayesian methods are an excellent fit for this

demand, but scaling Bayesian inference is a challenge. In

response to this challenge, there has been considerable recent

work based on varying assumptions about model structure,

underlying computational resources, and the importance of

asymptotic correctness. As a result, there is a zoo of ideas

with few clear overarching principles. In this paper, we seek

to identify unifying principles, patterns, and intuitions for

scaling Bayesian inference. We review existing work on

utilizing modern computing resources with both MCMC and

variational approximation techniques. From this taxonomy of

ideas, we characterize the general principles that have proven

successful for designing scalable inference procedures and

comment on the path forward.},

note = {arXiv:1602.05221 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

creating a demand for rich models and quantification of

uncertainty. Bayesian methods are an excellent fit for this

demand, but scaling Bayesian inference is a challenge. In

response to this challenge, there has been considerable recent

work based on varying assumptions about model structure,

underlying computational resources, and the importance of

asymptotic correctness. As a result, there is a zoo of ideas

with few clear overarching principles. In this paper, we seek

to identify unifying principles, patterns, and intuitions for

scaling Bayesian inference. We review existing work on

utilizing modern computing resources with both MCMC and

variational approximation techniques. From this taxonomy of

ideas, we characterize the general principles that have proven

successful for designing scalable inference procedures and

comment on the path forward.

Duvenaud, David; Maclaurin, Dougal; Adams, Ryan P.

Early Stopping is Nonparametric Variational Inference Conference

Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2016, (arXiv:1504.01344 [stat.ML]).

@conference{duvenaud2016early,

title = {Early Stopping is Nonparametric Variational Inference},

author = {David Duvenaud and Dougal Maclaurin and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/duvenaud2016early.pdf},

year = {2016},

date = {2016-01-01},

booktitle = {Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)},

abstract = {We show that unconverged stochastic gradient descent can be

interpreted as a procedure that samples from a nonparametric

variational approximate posterior distribution. This

distribution is implicitly defined as the transformation of an

initial distribution by a sequence of optimization updates. By

tracking the change in entropy over this sequence of

transformations during optimization, we form a scalable,

unbiased estimate of the variational lower bound on the log

marginal likelihood. We can use this bound to optimize

hyperparameters instead of using cross-validation. This

Bayesian interpretation of SGD suggests improved,

overfitting-resistant optimization procedures, and gives a

theoretical foundation for popular tricks such as early

stopping and ensembling. We investigate the properties of this

marginal likelihood estimator on neural network models.},

note = {arXiv:1504.01344 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

interpreted as a procedure that samples from a nonparametric

variational approximate posterior distribution. This

distribution is implicitly defined as the transformation of an

initial distribution by a sequence of optimization updates. By

tracking the change in entropy over this sequence of

transformations during optimization, we form a scalable,

unbiased estimate of the variational lower bound on the log

marginal likelihood. We can use this bound to optimize

hyperparameters instead of using cross-validation. This

Bayesian interpretation of SGD suggests improved,

overfitting-resistant optimization procedures, and gives a

theoretical foundation for popular tricks such as early

stopping and ensembling. We investigate the properties of this

marginal likelihood estimator on neural network models.

Johnson, Matthew J.; Duvenaud, David; Wiltschko, Alexander B.; Datta, Sandeep Robert; Adams, Ryan P.

Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference Conference

Advances in Neural Information Processing Systems (NIPS) 29, 2016, (arXiv:1603.06277 [stat.ML]).

@conference{johnson2016svae,

title = {Composing Graphical Models with Neural Networks for Structured Representations and Fast Inference},

author = {Matthew J. Johnson and David Duvenaud and Alexander B. Wiltschko and Sandeep Robert Datta and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/johnson2016svae.pdf},

year = {2016},

date = {2016-01-01},

booktitle = {Advances in Neural Information Processing Systems (NIPS) 29},

abstract = {We propose a general modeling and inference framework that

composes probabilistic graphical models with deep learning

methods and combines their respective strengths. Our model

family augments graphical structure in latent variables with

neural network observation models. For inference, we extend

variational autoencoders to use graphical model approximating

distributions with recognition networks that output conjugate

potentials. All components of these models are learned

simultaneously with a single objective, giving a scalable

algorithm that leverages stochastic variational inference,

natural gradients, graphical model message passing, and the

reparameterization trick. We illustrate this framework with

several example models and an application to mouse behavioral

phenotyping.},

note = {arXiv:1603.06277 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

composes probabilistic graphical models with deep learning

methods and combines their respective strengths. Our model

family augments graphical structure in latent variables with

neural network observation models. For inference, we extend

variational autoencoders to use graphical model approximating

distributions with recognition networks that output conjugate

potentials. All components of these models are learned

simultaneously with a single objective, giving a scalable

algorithm that leverages stochastic variational inference,

natural gradients, graphical model message passing, and the

reparameterization trick. We illustrate this framework with

several example models and an application to mouse behavioral

phenotyping.

Grosse, Roger B.; Ghahramani, Zoubin; Adams, Ryan P.

Sandwiching the Marginal Likelihood Using Bidirectional Monte Carlo Unpublished

2016, (arXiv:1511.02543 [stat.ML]).

@unpublished{grosse2015sandwiching,

title = {Sandwiching the Marginal Likelihood Using Bidirectional Monte Carlo},

author = {Roger B. Grosse and Zoubin Ghahramani and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/grosse2015sandwiching.pdf},

year = {2016},

date = {2016-01-01},

abstract = {Computing the marginal likelihood (ML) of a model requires

marginalizing out all of the parameters and latent variables,

a difficult high-dimensional summation or integration

problem. To make matters worse, it is often hard to measure

the accuracy of one’s ML estimates. We present bidirectional

Monte Carlo, a technique for obtaining accurate log-ML

estimates on data simulated from a model. This method obtains

stochastic lower bounds on the log-ML using annealed

importance sampling or sequential Monte Carlo, and obtains

stochastic upper bounds by running these same algorithms in

reverse starting from an exact posterior sample. The true

value can be sandwiched between these two stochastic bounds

with high probability. Using the ground truth log-ML estimates

obtained from our method, we quantitatively evaluate a wide

variety of existing ML estimators on several latent variable

models: clustering, a low rank approximation, and a binary

attributes model. These experiments yield insights into how to

accurately estimate marginal likelihoods.},

note = {arXiv:1511.02543 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {unpublished}

}

marginalizing out all of the parameters and latent variables,

a difficult high-dimensional summation or integration

problem. To make matters worse, it is often hard to measure

the accuracy of one’s ML estimates. We present bidirectional

Monte Carlo, a technique for obtaining accurate log-ML

estimates on data simulated from a model. This method obtains

stochastic lower bounds on the log-ML using annealed

importance sampling or sequential Monte Carlo, and obtains

stochastic upper bounds by running these same algorithms in

reverse starting from an exact posterior sample. The true

value can be sandwiched between these two stochastic bounds

with high probability. Using the ground truth log-ML estimates

obtained from our method, we quantitatively evaluate a wide

variety of existing ML estimators on several latent variable

models: clustering, a low rank approximation, and a binary

attributes model. These experiments yield insights into how to

accurately estimate marginal likelihoods.

Rao, Vinayak; Adams, Ryan P.; Dunson, David B.

Bayesian Inference for Matérn Repulsive Processes Journal Article

In: Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 79, no. 3, pp. 877–897, 2016, (arXiv:1308.1136 [stat.ME]).

@article{rao2016matern,

title = {Bayesian Inference for Matérn Repulsive Processes},

author = {Vinayak Rao and Ryan P. Adams and David B. Dunson},

url = {http://www.cs.princeton.edu/~rpa/pubs/rao2016matern.pdf},

year = {2016},

date = {2016-01-01},

journal = {Journal of the Royal Statistical Society: Series B (Statistical Methodology)},

volume = {79},

number = {3},

pages = {877--897},

abstract = {In many applications involving point pattern data, the Poisson

process assumption is unrealistic, with the data exhibiting a

more regular spread. Such a repulsion between events is

exhibited by trees for example, because of competition for

light and nutrients. Other examples include the locations of

biological cells and cities, and the times of neuronal

spikes. Given the many applications of repulsive point

processes, there is a surprisingly limited literature

developing flexible, realistic and interpretable models, as

well as efficient inferential methods. We address this gap by

developing a modelling framework around the Matérn type-III

repulsive process. We consider a number of extensions of the

original Matérn type-III process for both the homogeneous

and inhomogeneous cases. We also derive the probability

density of this generalized Matérn process. This allows us

to characterize the posterior distribution of the various

latent variables, and leads to a novel and efficient Markov

chain Monte Carlo algorithm. We apply our ideas to datasets

involving the spatial locations of trees, nerve fiber cells

and Greyhound bus stations.},

note = {arXiv:1308.1136 [stat.ME]},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

process assumption is unrealistic, with the data exhibiting a

more regular spread. Such a repulsion between events is

exhibited by trees for example, because of competition for

light and nutrients. Other examples include the locations of

biological cells and cities, and the times of neuronal

spikes. Given the many applications of repulsive point

processes, there is a surprisingly limited literature

developing flexible, realistic and interpretable models, as

well as efficient inferential methods. We address this gap by

developing a modelling framework around the Matérn type-III

repulsive process. We consider a number of extensions of the

original Matérn type-III process for both the homogeneous

and inhomogeneous cases. We also derive the probability

density of this generalized Matérn process. This allows us

to characterize the posterior distribution of the various

latent variables, and leads to a novel and efficient Markov

chain Monte Carlo algorithm. We apply our ideas to datasets

involving the spatial locations of trees, nerve fiber cells

and Greyhound bus stations.

Regier, Jeffrey; Miller, Andrew C.; McAuliffe, Jon; Adams, Ryan P.; Hoffman, Matthew D.; Lang, Dustin; Schlegel, David; Prabhat,

Celeste: Variational Inference for a Generative Model of Astronomical Images Conference

Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015, (arXiv:1506.01351 [astro-ph.IM]).

@conference{regier2015celeste,

title = {Celeste: Variational Inference for a Generative Model of Astronomical Images},

author = {Jeffrey Regier and Andrew C. Miller and Jon McAuliffe and Ryan P. Adams and Matthew D. Hoffman and Dustin Lang and David Schlegel and Prabhat},

url = {http://www.cs.princeton.edu/~rpa/pubs/regier2015celeste.pdf},

year = {2015},

date = {2015-01-01},

booktitle = {Proceedings of the 32nd International Conference on Machine Learning (ICML)},

abstract = {We present a new, fully generative model of optical telescope

image sets, along with a variational procedure for

inference. Each pixel intensity is treated as a Poisson random

variable, with a rate parameter dependent on latent properties

of stars and galaxies. Key latent properties are themselves

random, with scientific prior distributions constructed from

large ancillary data sets. We check our approach on synthetic

images. We also run it on images from a major sky survey,

where it exceeds the performance of the current

state-of-the-art method for locating celestial bodies and

measuring their colors.},

note = {arXiv:1506.01351 [astro-ph.IM]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

image sets, along with a variational procedure for

inference. Each pixel intensity is treated as a Poisson random

variable, with a rate parameter dependent on latent properties

of stars and galaxies. Key latent properties are themselves

random, with scientific prior distributions constructed from

large ancillary data sets. We check our approach on synthetic

images. We also run it on images from a major sky survey,

where it exceeds the performance of the current

state-of-the-art method for locating celestial bodies and

measuring their colors.

Linderman, Scott W.; Adams, Ryan P.

Scalable Bayesian Inference for Excitatory Point Process Networks Unpublished

2015, (arXiv:1507.03228 [stat.ML]).

@unpublished{linderman2015scalable,

title = {Scalable Bayesian Inference for Excitatory Point Process Networks},

author = {Scott W. Linderman and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/linderman2015scalable.pdf},

year = {2015},

date = {2015-01-01},

abstract = {Networks capture our intuition about relationships in the

world. They describe the friendships between Facebook users,

interactions in financial markets, and synapses connecting

neurons in the brain. These networks are richly structured

with cliques of friends, sectors of stocks, and a smorgasbord

of cell types that govern how neurons connect. Some networks,

like social network friendships, can be directly observed, but

in many cases we only have an indirect view of the network

through the actions of its constituents and an understanding

of how the network mediates that activity. In this work, we

focus on the problem of latent network discovery in the case

where the observable activity takes the form of a

mutually-excitatory point process known as a Hawkes

process. We build on previous work that has taken a Bayesian

approach to this problem, specifying prior distributions over

the latent network structure and a likelihood of observed

activity given this network. We extend this work by proposing

a discrete-time formulation and developing a computationally

efficient stochastic variational inference (SVI) algorithm

that allows us to scale the approach to long sequences of

observations. We demonstrate our algorithm on the calcium

imaging data used in the Chalearn neural connectomics

challenge.},

note = {arXiv:1507.03228 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {unpublished}

}

world. They describe the friendships between Facebook users,

interactions in financial markets, and synapses connecting

neurons in the brain. These networks are richly structured

with cliques of friends, sectors of stocks, and a smorgasbord

of cell types that govern how neurons connect. Some networks,

like social network friendships, can be directly observed, but

in many cases we only have an indirect view of the network

through the actions of its constituents and an understanding

of how the network mediates that activity. In this work, we

focus on the problem of latent network discovery in the case

where the observable activity takes the form of a

mutually-excitatory point process known as a Hawkes

process. We build on previous work that has taken a Bayesian

approach to this problem, specifying prior distributions over

the latent network structure and a likelihood of observed

activity given this network. We extend this work by proposing

a discrete-time formulation and developing a computationally

efficient stochastic variational inference (SVI) algorithm

that allows us to scale the approach to long sequences of

observations. We demonstrate our algorithm on the calcium

imaging data used in the Chalearn neural connectomics

challenge.

Linderman, Scott W.; Johnson, Matthew J.; Adams, Ryan P.

Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation Conference

Advances in Neural Information Processing Systems (NIPS) 28, 2015, (arXiv:1506.05843 [stat.ML]).

@conference{linderman2015multinomial,

title = {Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation},

author = {Scott W. Linderman and Matthew J. Johnson and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/linderman2015multinomial.pdf},

year = {2015},

date = {2015-01-01},

booktitle = {Advances in Neural Information Processing Systems (NIPS) 28},

abstract = {Many practical modeling problems involve discrete data that are

best represented as draws from multinomial or categorical

distributions. For example, nucleotides in a DNA sequence,

children's names in a given state and year, and text documents

are all commonly modeled with multinomial distributions. In

all of these cases, we expect some form of dependency between

the draws: the nucleotide at one position in the DNA strand

may depend on the preceding nucleotides, children's names are

highly correlated from year to year, and topics in text may be

correlated and dynamic. These dependencies are not naturally

captured by the typical Dirichlet-multinomial

formulation. Here, we leverage a logistic stick-breaking

representation and recent innovations in Polya-gamma

augmentation to reformulate the multinomial distribution in

terms of latent variables with jointly Gaussian likelihoods,

enabling us to take advantage of a host of Bayesian inference

techniques for Gaussian models with minimal overhead.},

note = {arXiv:1506.05843 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

best represented as draws from multinomial or categorical

distributions. For example, nucleotides in a DNA sequence,

children's names in a given state and year, and text documents

are all commonly modeled with multinomial distributions. In

all of these cases, we expect some form of dependency between

the draws: the nucleotide at one position in the DNA strand

may depend on the preceding nucleotides, children's names are

highly correlated from year to year, and topics in text may be

correlated and dynamic. These dependencies are not naturally

captured by the typical Dirichlet-multinomial

formulation. Here, we leverage a logistic stick-breaking

representation and recent innovations in Polya-gamma

augmentation to reformulate the multinomial distribution in

terms of latent variables with jointly Gaussian likelihoods,

enabling us to take advantage of a host of Bayesian inference

techniques for Gaussian models with minimal overhead.

Nishihara, Robert; Murray, Iain; Adams, Ryan P.

Parallel MCMC with Generalized Elliptical Slice Sampling Journal Article

In: Journal of Machine Learning Research, vol. 15, no. 1, pp. 2087-2112, 2014.

@article{nishihara2014generalized,

title = {Parallel MCMC with Generalized Elliptical Slice Sampling},

author = {Robert Nishihara and Iain Murray and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/nishihara2014generalized.pdf},

year = {2014},

date = {2014-01-01},

journal = {Journal of Machine Learning Research},

volume = {15},

number = {1},

pages = {2087-2112},

abstract = {Probabilistic models are conceptually powerful tools for finding

structure in data, but their practical effectiveness is often

limited by our ability to perform inference in them. Exact

inference is frequently intractable, so approximate inference

is often performed using Markov chain Monte Carlo (MCMC). To

achieve the best possible results from MCMC, we want to

efficiently simulate many steps of a rapidly mixing Markov

chain which leaves the target distribution invariant. Of

particular interest in this regard is how to take advantage of

multi-core computing to speed up MCMC-based inference, both to

improve mixing and to distribute the computational load. In

this paper, we present a parallelizable Markov chain Monte

Carlo algorithm for efficiently sampling from continuous

probability distributions that can take advantage of hundreds

of cores. This method shares information between parallel

Markov chains to build a scale-location mixture of Gaussians

approximation to the density function of the target

distribution. We combine this approximation with a recently

developed method known as elliptical slice sampling to create

a Markov chain with no step-size parameters that can mix

rapidly without requiring gradient or curvature computations.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

structure in data, but their practical effectiveness is often

limited by our ability to perform inference in them. Exact

inference is frequently intractable, so approximate inference

is often performed using Markov chain Monte Carlo (MCMC). To

achieve the best possible results from MCMC, we want to

efficiently simulate many steps of a rapidly mixing Markov

chain which leaves the target distribution invariant. Of

particular interest in this regard is how to take advantage of

multi-core computing to speed up MCMC-based inference, both to

improve mixing and to distribute the computational load. In

this paper, we present a parallelizable Markov chain Monte

Carlo algorithm for efficiently sampling from continuous

probability distributions that can take advantage of hundreds

of cores. This method shares information between parallel

Markov chains to build a scale-location mixture of Gaussians

approximation to the density function of the target

distribution. We combine this approximation with a recently

developed method known as elliptical slice sampling to create

a Markov chain with no step-size parameters that can mix

rapidly without requiring gradient or curvature computations.

Affandi, Raja Hafiz; Fox, Emily B.; Adams, Ryan P.; Taskar, Ben

Learning the Parameters of Determinantal Point Process Kernels Conference

Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.

@conference{affandi2014determinantal,

title = {Learning the Parameters of Determinantal Point Process Kernels},

author = {Raja Hafiz Affandi and Emily B. Fox and Ryan P. Adams and Ben Taskar},

url = {http://www.cs.princeton.edu/~rpa/pubs/affandi2014determinantal.pdf},

year = {2014},

date = {2014-01-01},

booktitle = {Proceedings of the 31st International Conference on Machine Learning (ICML)},

abstract = {Determinantal point processes (DPPs) are well-suited for

modeling repulsion and have proven useful in many applications

where diversity is desired. While DPPs have many appealing

properties, such as efficient sampling, learning the

parameters of a DPP is still considered a difficult problem

due to the non-convex nature of the likelihood function. In

this paper, we propose using Bayesian methods to learn the DPP

kernel parameters. These methods are applicable in large-scale

and continuous DPP settings even when the exact form of the

eigendecomposition is unknown. We demonstrate the utility of

our DPP learning methods in studying the progression of

diabetic neuropathy based on spatial distribution of nerve

fibers, and in studying human perception of diversity in

images.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

modeling repulsion and have proven useful in many applications

where diversity is desired. While DPPs have many appealing

properties, such as efficient sampling, learning the

parameters of a DPP is still considered a difficult problem

due to the non-convex nature of the likelihood function. In

this paper, we propose using Bayesian methods to learn the DPP

kernel parameters. These methods are applicable in large-scale

and continuous DPP settings even when the exact form of the

eigendecomposition is unknown. We demonstrate the utility of

our DPP learning methods in studying the progression of

diabetic neuropathy based on spatial distribution of nerve

fibers, and in studying human perception of diversity in

images.

Maclaurin, Dougal; Adams, Ryan P.

Firefly Monte Carlo: Exact MCMC with Subsets of Data Conference

Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014, (arXiv:1403.5693 [stat.ML]).

@conference{maclaurin2014firefly,

title = {Firefly Monte Carlo: Exact MCMC with Subsets of Data},

author = {Dougal Maclaurin and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/maclaurin2014firefly.pdf},

year = {2014},

date = {2014-01-01},

booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI)},

abstract = {Markov chain Monte Carlo (MCMC) is a popular and successful

general-purpose tool for Bayesian inference. However, MCMC

cannot be practically applied to large data sets because of

the prohibitive cost of evaluating every likelihood term at

every iteration. Here we present Fire- fly Monte Carlo (FlyMC)

an auxiliary variable MCMC algorithm that only queries the

likelihoods of a potentially small subset of the data at each

iteration yet simulates from the exact posterior distribution,

in contrast to recent proposals that are approximate even in

the asymptotic limit. FlyMC is compatible with a wide variety

of modern MCMC algorithms, and only requires a lower bound on

the per-datum likelihood factors. In experiments, we find that

FlyMC generates samples from the posterior more than an order

of magnitude faster than regular MCMC, opening up MCMC methods

to larger datasets than were previously considered feasible.},

note = {arXiv:1403.5693 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

general-purpose tool for Bayesian inference. However, MCMC

cannot be practically applied to large data sets because of

the prohibitive cost of evaluating every likelihood term at

every iteration. Here we present Fire- fly Monte Carlo (FlyMC)

an auxiliary variable MCMC algorithm that only queries the

likelihoods of a potentially small subset of the data at each

iteration yet simulates from the exact posterior distribution,

in contrast to recent proposals that are approximate even in

the asymptotic limit. FlyMC is compatible with a wide variety

of modern MCMC algorithms, and only requires a lower bound on

the per-datum likelihood factors. In experiments, we find that

FlyMC generates samples from the posterior more than an order

of magnitude faster than regular MCMC, opening up MCMC methods

to larger datasets than were previously considered feasible.

Angelino, Elaine; Kohler, Eddie; Waterland, Amos; Seltzer, Margo; Adams, Ryan P.

Accelerating MCMC via Parallel Predictive Prefetching Conference

Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), 2014, (arXiv:1403.7265 [stat.ML]).

@conference{angelino2014accelerating,

title = {Accelerating MCMC via Parallel Predictive Prefetching},

author = {Elaine Angelino and Eddie Kohler and Amos Waterland and Margo Seltzer and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/angelino2014accelerating.pdf},

year = {2014},

date = {2014-01-01},

booktitle = {Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI)},

abstract = {Parallel predictive prefetching is a new frame- work for

accelerating a large class of widely-used Markov chain Monte

Carlo (MCMC) algorithms. It speculatively evaluates many

potential steps of an MCMC chain in parallel while exploiting

fast, iterative approximations to the tar- get density. This

can accelerate sampling from target distributions in Bayesian

inference problems. Our approach takes advantage of whatever

parallel resources are available, but produces results exactly

equivalent to standard serial execution. In the initial

burn-in phase of chain evaluation, we achieve speedup close to

linear in the number of available cores.},

note = {arXiv:1403.7265 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

accelerating a large class of widely-used Markov chain Monte

Carlo (MCMC) algorithms. It speculatively evaluates many

potential steps of an MCMC chain in parallel while exploiting

fast, iterative approximations to the tar- get density. This

can accelerate sampling from target distributions in Bayesian

inference problems. Our approach takes advantage of whatever

parallel resources are available, but produces results exactly

equivalent to standard serial execution. In the initial

burn-in phase of chain evaluation, we achieve speedup close to

linear in the number of available cores.

Lovell, Dan; Malmaud, Jonathan; Adams, Ryan P.; Mansinghka, Vikash K.

ClusterCluster: Parallel Markov Chain Monte Carlo for Đirichlet Process Mixtures Unpublished

2013, (arXiv:1304.2302 [stat.ML]).

@unpublished{lovell2013cluster,

title = {ClusterCluster: Parallel Markov Chain Monte Carlo for Đirichlet Process Mixtures},

author = {Dan Lovell and Jonathan Malmaud and Ryan P. Adams and Vikash K. Mansinghka},

url = {http://www.cs.princeton.edu/~rpa/pubs/lovell2013cluster.pdf},

year = {2013},

date = {2013-01-01},

abstract = {The Dirichlet process (DP) is a fundamental mathematical tool

for Bayesian nonparametric modeling, and is widely used in

tasks such as density estimation, natural language processing,

and time series modeling. Although MCMC inference methods for

the DP often provide a gold standard in terms asymptotic

accuracy, they can be computationally expensive and are not

obviously parallelizable. We propose a reparameterization of

the Dirichlet process that induces conditional independencies

between the atoms that form the random measure. This

conditional independence enables many of the Markov chain

transition operators for DP inference to be simulated in

parallel across multiple cores. Applied to mixture modeling,

our approach enables the Dirichlet process to simultaneously

learn clusters that describe the data and superclusters that

define the granularity of parallelization. Unlike previous

approaches, our technique does not require alteration of the

model and leaves the true posterior distribution invariant. It

also naturally lends itself to a distributed software

implementation in terms of Map-Reduce, which we test in

cluster configurations of over 50 machines and 100 cores. We

present experiments exploring the parallel efficiency and

convergence properties of our approach on both synthetic and

real-world data, including runs on 1MM data vectors in 256

dimensions.},

note = {arXiv:1304.2302 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {unpublished}

}

for Bayesian nonparametric modeling, and is widely used in

tasks such as density estimation, natural language processing,

and time series modeling. Although MCMC inference methods for

the DP often provide a gold standard in terms asymptotic

accuracy, they can be computationally expensive and are not

obviously parallelizable. We propose a reparameterization of

the Dirichlet process that induces conditional independencies

between the atoms that form the random measure. This

conditional independence enables many of the Markov chain

transition operators for DP inference to be simulated in

parallel across multiple cores. Applied to mixture modeling,

our approach enables the Dirichlet process to simultaneously

learn clusters that describe the data and superclusters that

define the granularity of parallelization. Unlike previous

approaches, our technique does not require alteration of the

model and leaves the true posterior distribution invariant. It

also naturally lends itself to a distributed software

implementation in terms of Map-Reduce, which we test in

cluster configurations of over 50 machines and 100 cores. We

present experiments exploring the parallel efficiency and

convergence properties of our approach on both synthetic and

real-world data, including runs on 1MM data vectors in 256

dimensions.

Murray, Iain; Adams, Ryan P.; MacKay, David J. C.

Elliptical Slice Sampling Conference

Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, (arXiv:1001.0175 [stat.CO]).

@conference{murray2010elliptical,

title = {Elliptical Slice Sampling},

author = {Iain Murray and Ryan P. Adams and David J.C. MacKay},

url = {http://www.cs.princeton.edu/~rpa/pubs/murray2010elliptical.pdf},

year = {2010},

date = {2010-01-01},

booktitle = {Proceedings of the 13th International Conference on Artificial

Intelligence and Statistics (AISTATS)},

abstract = {Many probabilistic models introduce strong dependencies

between variables using a latent multivariate Gaussian

distribution or a Gaussian process. We present a new Markov

chain Monte Carlo algorithm for performing inference in models

with multivariate Gaussian priors. Its key properties are: 1)

it has simple, generic code applicable to many models, 2) it

has no free parameters, 3) it works well for a variety of

Gaussian process based models. These properties make our

method ideal for use while model building, removing the need

to spend time deriving and tuning updates for more complex

algorithms.},

note = {arXiv:1001.0175 [stat.CO]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

between variables using a latent multivariate Gaussian

distribution or a Gaussian process. We present a new Markov

chain Monte Carlo algorithm for performing inference in models

with multivariate Gaussian priors. Its key properties are: 1)

it has simple, generic code applicable to many models, 2) it

has no free parameters, 3) it works well for a variety of

Gaussian process based models. These properties make our

method ideal for use while model building, removing the need

to spend time deriving and tuning updates for more complex

algorithms.

Adams, Ryan P.; Wallach, Hanna M.; Ghahramani, Zoubin

Learning the Structure of Deep Sparse Graphical Models Conference

Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, (arXiv:1001.0160 [stat.ML]).

@conference{adams2010deep,

title = {Learning the Structure of Deep Sparse Graphical Models},

author = {Ryan P. Adams and Hanna M. Wallach and Zoubin Ghahramani},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2010deep.pdf},

year = {2010},

date = {2010-01-01},

booktitle = {Proceedings of the 13th International Conference on Artificial

Intelligence and Statistics (AISTATS)},

abstract = {Deep belief networks are a powerful way to model complex

probability distributions. However, learning the structure of

a belief network, particularly one with hidden units, is

difficult. The Indian buffet process has been used as a

nonparametric Bayesian prior on the directed structure of a

belief network with a single infinitely wide hidden layer. In

this paper, we introduce the cascading Indian buffet process

(CIBP), which provides a nonparametric prior on the structure

of a layered, directed belief network that is unbounded in

both depth and width, yet allows tractable inference. We use

the CIBP prior with the nonlinear Gaussian belief network so

each unit can additionally vary its behavior between discrete

and continuous representations. We provide Markov chain Monte

Carlo algorithms for inference in these belief networks and

explore the structures learned on several image data sets.},

note = {arXiv:1001.0160 [stat.ML]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

probability distributions. However, learning the structure of

a belief network, particularly one with hidden units, is

difficult. The Indian buffet process has been used as a

nonparametric Bayesian prior on the directed structure of a

belief network with a single infinitely wide hidden layer. In

this paper, we introduce the cascading Indian buffet process

(CIBP), which provides a nonparametric prior on the structure

of a layered, directed belief network that is unbounded in

both depth and width, yet allows tractable inference. We use

the CIBP prior with the nonlinear Gaussian belief network so

each unit can additionally vary its behavior between discrete

and continuous representations. We provide Markov chain Monte

Carlo algorithms for inference in these belief networks and

explore the structures learned on several image data sets.

Murray, Iain; Adams, Ryan P.

Slice Sampling Covariance Hyperparameters in Latent Gaussian Models Conference

Advances in Neural Information Processing Systems (NIPS) 23, 2010, (arXiv:1006.0868 [stat.CO]).

@conference{murray2010hyper,

title = {Slice Sampling Covariance Hyperparameters in Latent

Gaussian Models},

author = {Iain Murray and Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/murray2010hyper.pdf},

year = {2010},

date = {2010-01-01},

booktitle = {Advances in Neural Information Processing Systems (NIPS) 23},

abstract = {The Gaussian process (GP) is a popular way to specify

dependencies between random variables in a probabilistic

model. In the Bayesian framework the covariance structure can

be specified using unknown hyperparameters. Integrating over

these hyperparameters considers different possible

explanations for the data when making predictions. This

integration is often performed using Markov chain Monte Carlo

(MCMC) sampling. However, with non-Gaussian observations

standard hyperparameter sampling approaches require careful

tuning and may converge slowly. In this paper we present a

slice sampling approach that requires little tuning while

mixing well in both strong- and weak-data regimes.},

note = {arXiv:1006.0868 [stat.CO]},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

dependencies between random variables in a probabilistic

model. In the Bayesian framework the covariance structure can

be specified using unknown hyperparameters. Integrating over

these hyperparameters considers different possible

explanations for the data when making predictions. This

integration is often performed using Markov chain Monte Carlo

(MCMC) sampling. However, with non-Gaussian observations

standard hyperparameter sampling approaches require careful

tuning and may converge slowly. In this paper we present a

slice sampling approach that requires little tuning while

mixing well in both strong- and weak-data regimes.

Adams, Ryan P.; Ghahramani, Zoubin

Archipelago: Nonparametric Bayesian Semi-Supervised Learning Conference

Proceedings of the 26th International Conference on Machine Learning (ICML), 2009.

@conference{adams2009archipelago,

title = {Archipelago: Nonparametric Bayesian Semi-Supervised

Learning},

author = {Ryan P. Adams and Zoubin Ghahramani},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009archipelago.pdf},

year = {2009},

date = {2009-01-01},

booktitle = {Proceedings of the 26th International Conference on

Machine Learning (ICML)},

abstract = {Semi-supervised learning (SSL), is classification where

additional unlabeled data can be used to improve

accuracy. Generative approaches are appealing in this

situation, as a model of the data's probability density can

assist in identifying clusters. Nonparametric Bayesian

methods, while ideal in theory due to their principled

motivations, have been difficult to apply to SSL in

practice. We present a nonparametric Bayesian method that uses

Gaussian processes for the generative model, avoiding many of

the problems associated with Dirichlet process mixture

models. Our model is fully generative and we take advantage of

recent advances in Markov chain Monte Carlo algorithms to

provide a practical inference method. Our method compares

favorably to competing approaches on synthetic and real-world

multi-class data.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

additional unlabeled data can be used to improve

accuracy. Generative approaches are appealing in this

situation, as a model of the data's probability density can

assist in identifying clusters. Nonparametric Bayesian

methods, while ideal in theory due to their principled

motivations, have been difficult to apply to SSL in

practice. We present a nonparametric Bayesian method that uses

Gaussian processes for the generative model, avoiding many of

the problems associated with Dirichlet process mixture

models. Our model is fully generative and we take advantage of

recent advances in Markov chain Monte Carlo algorithms to

provide a practical inference method. Our method compares

favorably to competing approaches on synthetic and real-world

multi-class data.

Adams, Ryan P.; Murray, Iain; MacKay, David J. C.

The Gaussian Process Density Sampler Conference

Advances in Neural Information Processing Systems 21 (NIPS), 2009.

@conference{adams2009gpds,

title = {The Gaussian Process Density Sampler},

author = {Ryan P. Adams and Iain Murray and David J.C. MacKay},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009gpds.pdf},

year = {2009},

date = {2009-01-01},

booktitle = {Advances in Neural Information Processing Systems 21 (NIPS)},

abstract = {We present the Gaussian Process Density Sampler (GPDS), an

exchangeable generative model for use in nonparametric

Bayesian density estimation. Samples drawn from the GPDS are

consistent with exact, independent samples from a fixed

density function that is a transformation of a function drawn

from a Gaussian process prior. Our formulation allows us to

infer an unknown density from data using Markov chain Monte

Carlo, which gives samples from the posterior distribution

over density functions and from the predictive distribution on

data space. We can also infer the hyperparameters of the

Gaussian process. We compare this density modeling technique

to several existing techniques on a toy problem and a

skull-reconstruction task.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

exchangeable generative model for use in nonparametric

Bayesian density estimation. Samples drawn from the GPDS are

consistent with exact, independent samples from a fixed

density function that is a transformation of a function drawn

from a Gaussian process prior. Our formulation allows us to

infer an unknown density from data using Markov chain Monte

Carlo, which gives samples from the posterior distribution

over density functions and from the predictive distribution on

data space. We can also infer the hyperparameters of the

Gaussian process. We compare this density modeling technique

to several existing techniques on a toy problem and a

skull-reconstruction task.

Adams, Ryan P.; Murray, Iain; MacKay, David J. C.

Tractable Nonparametric Bayesian Inference in Poisson Processes with Gaussian Process Intensities Conference

Proceedings of the 26th International Conference on Machine Learning (ICML), Montréal, Canada, 2009.

@conference{adams2009poisson,

title = {Tractable Nonparametric Bayesian Inference in Poisson

Processes with Gaussian Process Intensities},

author = {Ryan P. Adams and Iain Murray and David J.C. MacKay},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009gpds.pdf},

year = {2009},

date = {2009-01-01},

booktitle = {Proceedings of the 26th International Conference on

Machine Learning (ICML)},

address = {Montréal, Canada},

abstract = {The inhomogeneous Poisson process is a point process that has

varying intensity across its domain (usually time or

space). For nonparametric Bayesian modeling, the Gaussian

process is a useful way to place a prior distribution on this

intensity. The combination of a Poisson process and GP is

known as a Gaussian Cox process, or doubly-stochastic Poisson

process. Likelihood-based inference in these models requires

an intractable integral over an infinite-dimensional random

function. In this paper we present the first approach to

Gaussian Cox processes in which it is possible to perform

inference without introducing approximations or

finite-dimensional proxy distributions. We call our method the

Sigmoidal Gaussian Cox Process, which uses a generative model

for Poisson data to enable tractable inference via Markov

chain Monte Carlo. We compare our methods to competing methods

on synthetic data and apply it to several real-world data

sets.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

varying intensity across its domain (usually time or

space). For nonparametric Bayesian modeling, the Gaussian

process is a useful way to place a prior distribution on this

intensity. The combination of a Poisson process and GP is

known as a Gaussian Cox process, or doubly-stochastic Poisson

process. Likelihood-based inference in these models requires

an intractable integral over an infinite-dimensional random

function. In this paper we present the first approach to

Gaussian Cox processes in which it is possible to perform

inference without introducing approximations or

finite-dimensional proxy distributions. We call our method the

Sigmoidal Gaussian Cox Process, which uses a generative model

for Poisson data to enable tractable inference via Markov

chain Monte Carlo. We compare our methods to competing methods

on synthetic data and apply it to several real-world data

sets.

Adams, Ryan P.; Murray, Iain; MacKay, David J. C.

Nonparametric Bayesian Density Modeling with Gaussian Processes Unpublished

2009, (arXiv:0912.4896 [stat.CO]).

@unpublished{adams2009nonparametric,

title = {Nonparametric Bayesian Density Modeling with Gaussian Processes},

author = {Ryan P. Adams and Iain Murray and David J.C. MacKay},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009nonparametric.pdf},

year = {2009},

date = {2009-01-01},

abstract = {We present the Gaussian process density sampler (GPDS), an

exchangeable generative model for use in nonparametric

Bayesian density estimation. Samples drawn from the GPDS are

consistent with exact, independent samples from a distribution

defined by a density that is a transformation of a function

drawn from a Gaussian process prior. Our formulation allows us

to infer an unknown density from data using Markov chain Monte

Carlo, which gives samples from the posterior distribution

over density functions and from the predictive distribution on

data space. We describe two such MCMC methods. Both methods

also allow inference of the hyperparameters of the Gaussian

process.},

note = {arXiv:0912.4896 [stat.CO]},

keywords = {},

pubstate = {published},

tppubtype = {unpublished}

}

exchangeable generative model for use in nonparametric

Bayesian density estimation. Samples drawn from the GPDS are

consistent with exact, independent samples from a distribution

defined by a density that is a transformation of a function

drawn from a Gaussian process prior. Our formulation allows us

to infer an unknown density from data using Markov chain Monte

Carlo, which gives samples from the posterior distribution

over density functions and from the predictive distribution on

data space. We describe two such MCMC methods. Both methods

also allow inference of the hyperparameters of the Gaussian

process.

Adams, Ryan P.

Kernel Methods for Nonparametric Bayesian Inference of Probability Densities and Point Processes PhD Thesis

University of Cambridge, 2009.

@phdthesis{adams2009thesis,

title = {Kernel Methods for Nonparametric Bayesian Inference of

Probability Densities and Point Processes},

author = {Ryan P. Adams},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2009thesis.pdf},

year = {2009},

date = {2009-01-01},

address = {Cambridge, UK},

school = {University of Cambridge},

abstract = {Nonparametric kernel methods for estimation of probability

densities and point process intensities have long been of

interest to researchers in statistics and machine

learning. Frequentist kernel methods are widely used, but

provide only a point estimate of the unknown

density. Additionally, in frequentist kernel density methods,

it can be difficult to select appropriate kernel

parameters. The Bayesian approach to inference potentially

resolves both of these deficiencies, by providing a

distribution over the unknowns and enabling a principled

approach to kernel selection. Constructing a Bayesian

nonparametric kernel density method has proven to be

difficult, however, due to the need to integrate over an

infinite-dimensional random function in order to evaluate the

likelihood. To avoid this intractability, all Bayesian kernel

density methods to date have either used a crippled model or a

finite-dimensional approximation. Recent advances in Markov

chain Monte Carlo methods have improved the situation for

these doubly-intractable posterior distributions, however. If

data can be generated exactly from the model, then it is

possible to perform inference without computing the

intractable likelihood. I propose two new kernel-based models

that enable an exact generative procedure: the Gaussian

process density sampler (GPDS) for probability density

functions, and the sigmoidal Gaussian Cox process (SGCP) for

the Poisson process. With generative priors, I show how it is

now possible to construct two dif- ferent kinds of Markov

chains for inference in these models. These Markov chains have

the desired posterior distribution as their equilibrium

distributions, and, despite a parameter space with uncountably

many dimensions, require only a finite amount of computation

to simulate. The GPDS and SGCP, and the associated inference

procedures, are the first kernel-based nonparametric Bayesian

methods that allow inference without a finite-dimensional

approximation. I also present several additional kernel-based

models for data that extend the Gaussian process density

sampler and sigmoidal Gaussian Cox process to other

situations. The Archipelago model extends the GPDS to address

the task of semi-supervised learning, where a flexible density

estimate can improve the performance of a classifier when

unlabeled data are available. I also generalise the SGCP to

enable a nonparametric inhomogeneous Neyman–Scott

process, and present a soft-core generalisation of the Mate

Ìrn repulsive process that similarly allows

non-approximate inference via Markov chain Monte Carlo.},

keywords = {},

pubstate = {published},

tppubtype = {phdthesis}

}

densities and point process intensities have long been of

interest to researchers in statistics and machine

learning. Frequentist kernel methods are widely used, but

provide only a point estimate of the unknown

density. Additionally, in frequentist kernel density methods,

it can be difficult to select appropriate kernel

parameters. The Bayesian approach to inference potentially

resolves both of these deficiencies, by providing a

distribution over the unknowns and enabling a principled

approach to kernel selection. Constructing a Bayesian

nonparametric kernel density method has proven to be

difficult, however, due to the need to integrate over an

infinite-dimensional random function in order to evaluate the

likelihood. To avoid this intractability, all Bayesian kernel

density methods to date have either used a crippled model or a

finite-dimensional approximation. Recent advances in Markov

chain Monte Carlo methods have improved the situation for

these doubly-intractable posterior distributions, however. If

data can be generated exactly from the model, then it is

possible to perform inference without computing the

intractable likelihood. I propose two new kernel-based models

that enable an exact generative procedure: the Gaussian

process density sampler (GPDS) for probability density

functions, and the sigmoidal Gaussian Cox process (SGCP) for

the Poisson process. With generative priors, I show how it is

now possible to construct two dif- ferent kinds of Markov

chains for inference in these models. These Markov chains have

the desired posterior distribution as their equilibrium

distributions, and, despite a parameter space with uncountably

many dimensions, require only a finite amount of computation

to simulate. The GPDS and SGCP, and the associated inference

procedures, are the first kernel-based nonparametric Bayesian

methods that allow inference without a finite-dimensional

approximation. I also present several additional kernel-based

models for data that extend the Gaussian process density

sampler and sigmoidal Gaussian Cox process to other

situations. The Archipelago model extends the GPDS to address

the task of semi-supervised learning, where a flexible density

estimate can improve the performance of a classifier when

unlabeled data are available. I also generalise the SGCP to

enable a nonparametric inhomogeneous Neyman–Scott

process, and present a soft-core generalisation of the Mate

Ìrn repulsive process that similarly allows

non-approximate inference via Markov chain Monte Carlo.

Adams, Ryan P.; Stegle, Oliver

Gaussian Process Product Models for Nonparametric Nonstationarity Conference

Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 2008.

@conference{adams2008gppm,

title = {Gaussian Process Product Models for Nonparametric

Nonstationarity},

author = {Ryan P. Adams and Oliver Stegle},

url = {http://www.cs.princeton.edu/~rpa/pubs/adams2008gppm.pdf},

year = {2008},

date = {2008-01-01},

booktitle = {Proceedings of the 25th International Conference on

Machine Learning (ICML)},

pages = {1-8},

address = {Helsinki, Finland},

abstract = {Stationarity is often an unrealistic prior assumption for

Gaussian process regression. One solution is to predefine an

explicit nonstationary covariance function, but such

covariance functions can be difficult to specify and require

detailed prior knowledge of the nonstationarity. We propose

the Gaussian process product model (GPPM) which models data as

the pointwise product of two latent Gaussian processes to

nonparametrically infer nonstationary variations of

amplitude. This approach differs from other nonparametric

approaches to covariance function inference in that it

operates on the outputs rather than the inputs, resulting in a

significant reduction in computational cost and required data

for inference. We present an approximate inference scheme

using Expectation Propagation. This variational approximation

yields convenient GP hyperparameter selection and compact

approximate predictive distributions.},

keywords = {},

pubstate = {published},

tppubtype = {conference}

}

Gaussian process regression. One solution is to predefine an

explicit nonstationary covariance function, but such

covariance functions can be difficult to specify and require

detailed prior knowledge of the nonstationarity. We propose

the Gaussian process product model (GPPM) which models data as

the pointwise product of two latent Gaussian processes to

nonparametrically infer nonstationary variations of

amplitude. This approach differs from other nonparametric

approaches to covariance function inference in that it

operates on the outputs rather than the inputs, resulting in a

significant reduction in computational cost and required data

for inference. We present an approximate inference scheme

using Expectation Propagation. This variational approximation

yields convenient GP hyperparameter selection and compact

approximate predictive distributions.