Automating the Design of Chemical Compounds and Reactions
Explanation here.
Shields, Benjamin J.; Stevens, Jason; Li, Jun; Parasram, Marvin; Damani, Farhan; Alvarado, Jesus I. Martinez; Janey, Jacob M.; Adams, Ryan P.; Doyle, Abigail G.
Bayesian reaction optimization as a tool for chemical synthesis Journal Article
In: Nature, vol. 590, pp. 89-96, 2021.
@article{shields2021bayesian,
title = {Bayesian reaction optimization as a tool for chemical synthesis},
author = {Benjamin J. Shields and Jason Stevens and Jun Li and Marvin Parasram and Farhan Damani and Jesus I. Martinez Alvarado and Jacob M. Janey and Ryan P. Adams and Abigail G. Doyle},
year = {2021},
date = {2021-04-01},
journal = {Nature},
volume = {590},
pages = {89-96},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Seff, Ari; Zhou, Wenda; Damani, Farhan; Doyle, Abigail; Adams, Ryan P.
Discrete Object Generation with Reversible Inductive Construction Conference
Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.
@conference{seff2019discrete,
title = {Discrete Object Generation with Reversible Inductive Construction},
author = {Ari Seff and
Wenda Zhou and
Farhan Damani and
Abigail Doyle and
Ryan P. Adams},
url = {https://www.cs.princeton.edu/~rpa/pubs/seff2019discrete.pdf},
year = {2019},
date = {2019-12-04},
booktitle = {Advances in Neural Information Processing Systems 32 (NeurIPS)},
abstract = {The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs. However, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn. Additionally, structured discrete domains are often characterized by strict constraints on what constitutes a valid object and generative models must respect these requirements in order to produce useful novel samples. Here, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity. Building off of generative interpretations of denoising autoencoders, the Markov chain alternates between producing 1) a sequence of corrupted objects that are valid but not from the data distribution, and 2) a learned reconstruction distribution that attempts to fix the corruptions while also preserving validity. This approach constrains the generative model to only produce valid objects, requires the learner to only discover local modifications to the objects, and avoids marginalization over an unknown and potentially large space of construction histories. We evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and find that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
Wei, Jennifer N.; Belanger, David; Adams, Ryan P.; Sculley, D.
Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks Journal Article
In: ACS Central Science, vol. 5, no. 4, pp. 700-708, 2019.
@article{wei2019rapid,
title = {Rapid Prediction of Electron–Ionization Mass Spectrometry Using Neural Networks},
author = {Jennifer N. Wei and
David Belanger and
Ryan P. Adams and
D. Sculley},
url = {https://www.cs.princeton.edu/~rpa/pubs/wei2019rapid.pdf},
year = {2019},
date = {2019-03-19},
journal = {ACS Central Science},
volume = {5},
number = {4},
pages = {700-708},
abstract = {When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare the observed spectrum to a library of previously collected spectra to identify the molecule. While popular, this approach will fail to identify molecules that are not in the existing library. In response, we propose to improve the library’s coverage by augmenting it with synthetic spectra that are predicted from candidate molecules using machine learning. We contribute a lightweight neural network model that quickly predicts mass spectra for small molecules, averaging 5 ms per molecule with a recall-at-10 accuracy of 91.8%. Achieving high-accuracy predictions requires a novel neural network architecture that is designed to capture typical fragmentation patterns from electron ionization. We analyze the effects of our modeling innovations on library matching performance and compare our models to prior machine-learning-based work on spectrum prediction.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Gómez-Bombarelli, Rafael; Wei, Jennifer; Duvenaud, David; Hernández-Lobato, Jose Miguel; Sánchez-Lengeling, Benjamin; Sheberla, Dennis; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Adams, Ryan P.; Aspuru-Guzik, Alan
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules Journal Article
In: ACS Central Science, vol. 4, no. 2, pp. 268–276, 2018, (arXiv:1610.02415 [cs.LG]).
@article{bombarelli2018automatic,
title = {Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules},
author = {Rafael Gómez-Bombarelli and Jennifer Wei and David Duvenaud and Jose Miguel Hernández-Lobato and Benjamin Sánchez-Lengeling and Dennis Sheberla and Jorge Aguilera-Iparraguirre and Timothy D. Hirzel and Ryan P. Adams and Alan Aspuru-Guzik},
url = {http://www.cs.princeton.edu/~rpa/pubs/bombarelli2018automatic.pdf},
year = {2018},
date = {2018-01-01},
journal = {ACS Central Science},
volume = {4},
number = {2},
pages = {268--276},
abstract = {We report a method to convert discrete representations of
molecules to and from a multidimensional continuous
representation. This model allows us to generate new molecules
for efficient exploration and optimization through open-ended
spaces of chemical compounds. A deep neural network was
trained on hundreds of thousands of existing chemical
structures to construct three coupled functions: an encoder, a
decoder and a predictor. The encoder converts the discrete
representation of a molecule into a real-valued continuous
vector, and the decoder converts these continuous vectors back
to discrete molecular representations. The predictor estimates
chemical properties from the latent continuous vector
representation of the molecule. Continuous representations
allow us to automatically generate novel chemical structures
by performing simple operations in the latent space, such as
decoding random vectors, perturbing known chemical structures,
or interpolating between molecules. Continuous representations
also allow the use of powerful gradient-based optimization to
efficiently guide the search for optimized functional
compounds. We demonstrate our method in the domain of
drug-like molecules and also in the set of molecules with
fewer that nine heavy atoms.},
note = {arXiv:1610.02415 [cs.LG]},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
molecules to and from a multidimensional continuous
representation. This model allows us to generate new molecules
for efficient exploration and optimization through open-ended
spaces of chemical compounds. A deep neural network was
trained on hundreds of thousands of existing chemical
structures to construct three coupled functions: an encoder, a
decoder and a predictor. The encoder converts the discrete
representation of a molecule into a real-valued continuous
vector, and the decoder converts these continuous vectors back
to discrete molecular representations. The predictor estimates
chemical properties from the latent continuous vector
representation of the molecule. Continuous representations
allow us to automatically generate novel chemical structures
by performing simple operations in the latent space, such as
decoding random vectors, perturbing known chemical structures,
or interpolating between molecules. Continuous representations
also allow the use of powerful gradient-based optimization to
efficiently guide the search for optimized functional
compounds. We demonstrate our method in the domain of
drug-like molecules and also in the set of molecules with
fewer that nine heavy atoms.
Gómez-Bombarelli, Rafael; Aguilera-Iparraguirre, Jorge; Hirzel, Timothy D.; Duvenaud, David; Maclaurin, Dougal; Blood-Forsythe, Martin A.; Chae, Hyun Sik; Einzinger, Markus; Ha, Dong-Gwang; Wu, Tony; Markopolous, Georgios; Jeon, Soonok; Kang, Hosuk; Miyazaki, Hiroshi; Numata, Masaki; Kim, Sunghan; Huang, Wenliang; Hong, Seong Ik; Baldo, Marc; Adams, Ryan P.; Aspuru-Guzik, Alan
Design of Efficient Molecular Organic Light-Emitting Diodes by a High-Throughput Virtual Screening and Experimental Approach Journal Article
In: Nature Materials, vol. 15, no. 10, pp. 1120–1127, 2016.
@article{bombarelli2016oleds,
title = {Design of Efficient Molecular Organic Light-Emitting Diodes by a High-Throughput Virtual Screening and Experimental Approach},
author = {Rafael Gómez-Bombarelli and Jorge Aguilera-Iparraguirre and Timothy D. Hirzel and David Duvenaud and Dougal Maclaurin and Martin A. Blood-Forsythe and Hyun Sik Chae and Markus Einzinger and Dong-Gwang Ha and Tony Wu and Georgios Markopolous and Soonok Jeon and Hosuk Kang and Hiroshi Miyazaki and Masaki Numata and Sunghan Kim and Wenliang Huang and Seong Ik Hong and Marc Baldo and Ryan P. Adams and Alan Aspuru-Guzik},
url = {http://www.cs.princeton.edu/~rpa/pubs/bombarelli2016oleds.pdf},
year = {2016},
date = {2016-01-01},
journal = {Nature Materials},
volume = {15},
number = {10},
pages = {1120--1127},
abstract = {Virtual screening is becoming a ground-breaking tool for
molecular discovery due to the exponential growth of available
computer time and constant improvement of simulation and
machine learning techniques. We report an integrated organic
functional material design process that incorporates
theoretical insight, quantum chemistry, cheminformatics,
machine learning, industrial expertise, organic synthesis,
molecular characterization, device fabrication and
optoelectronic testing. After exploring a search space of 1.6
million molecules and screening over 400,000 of them using
time-dependent density functional theory, we identified
thousands of promising novel organic light-emitting diode
molecules across the visible spectrum. Our team
collaboratively selected the best candidates from this
set. The experimentally determined external quantum
efficiencies for these synthesized candidates were as large as
22%.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
molecular discovery due to the exponential growth of available
computer time and constant improvement of simulation and
machine learning techniques. We report an integrated organic
functional material design process that incorporates
theoretical insight, quantum chemistry, cheminformatics,
machine learning, industrial expertise, organic synthesis,
molecular characterization, device fabrication and
optoelectronic testing. After exploring a search space of 1.6
million molecules and screening over 400,000 of them using
time-dependent density functional theory, we identified
thousands of promising novel organic light-emitting diode
molecules across the visible spectrum. Our team
collaboratively selected the best candidates from this
set. The experimentally determined external quantum
efficiencies for these synthesized candidates were as large as
22%.
Duvenaud, David; Maclaurin, Dougal; Aguilera-Iparraguirre, Jorge; Gómez-Bombarelli, Rafael; Hirzel, Timothy D.; Aspuru-Guzik, Alan; Adams, Ryan P.
Convolutional Networks on Graphs for Learning Molecular Fingerprints Conference
Advances in Neural Information Processing Systems (NIPS) 28, 2015, (arXiv:1509.09292 [stat.ML]).
@conference{duvenaud2015fingerprints,
title = {Convolutional Networks on Graphs for Learning Molecular Fingerprints},
author = {David Duvenaud and Dougal Maclaurin and Jorge Aguilera-Iparraguirre and Rafael Gómez-Bombarelli and Timothy D. Hirzel and Alan Aspuru-Guzik and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/duvenaud2015fingerprints.pdf},
year = {2015},
date = {2015-01-01},
booktitle = {Advances in Neural Information Processing Systems (NIPS) 28},
abstract = {We introduce a convolutional neural network that operates
directly on graphs. These networks allow end-to-end learning
of prediction pipelines whose inputs are graphs of arbitrary
size and shape. The architecture we present generalizes
standard molecular feature extraction methods based on
circular fingerprints. We show that these data-driven features
are more interpretable, and have better predictive performance
on a variety of tasks.},
note = {arXiv:1509.09292 [stat.ML]},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
directly on graphs. These networks allow end-to-end learning
of prediction pipelines whose inputs are graphs of arbitrary
size and shape. The architecture we present generalizes
standard molecular feature extraction methods based on
circular fingerprints. We show that these data-driven features
are more interpretable, and have better predictive performance
on a variety of tasks.
Napp, Nils; Adams, Ryan P.
Message Passing Inference with Chemical Reaction Networks Conference
Advances in Neural Information Processing Systems (NIPS) 26, 2013.
@conference{napp2013reactions,
title = {Message Passing Inference with Chemical Reaction Networks},
author = {Nils Napp and Ryan P. Adams},
url = {http://www.cs.princeton.edu/~rpa/pubs/napp2013reactions.pdf},
year = {2013},
date = {2013-01-01},
booktitle = {Advances in Neural Information Processing Systems (NIPS) 26},
abstract = {Recent work on molecular programming has explored new
possibilities for computational abstractions with
biomolecules, including logic gates, neural networks, and
linear systems. In the future such abstractions might enable
nanoscale devices that can sense and control the world at a
molecular scale. Just as in macroscale robotics, it is
critical that such devices can learn about their environment
and reason under uncertainty. At this small scale, systems are
typically modeled as chemical reaction networks. In this work,
we develop a procedure that can take arbitrary probabilistic
graphical models, represented as factor graphs over discrete
random variables, and compile them into chemical reaction
networks that implement inference. In particular, we show that
marginalization based on sum-product message passing can be
implemented in terms of reactions between chemical species
whose concentrations represent probabilities. We show
algebraically that the steady state concentration of these
species correspond to the marginal distributions of the random
variables in the graph and validate the results in
simulations. As with standard sum-product inference, this
procedure yields exact results for tree-structured graphs, and
approximate solutions for loopy graphs.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
possibilities for computational abstractions with
biomolecules, including logic gates, neural networks, and
linear systems. In the future such abstractions might enable
nanoscale devices that can sense and control the world at a
molecular scale. Just as in macroscale robotics, it is
critical that such devices can learn about their environment
and reason under uncertainty. At this small scale, systems are
typically modeled as chemical reaction networks. In this work,
we develop a procedure that can take arbitrary probabilistic
graphical models, represented as factor graphs over discrete
random variables, and compile them into chemical reaction
networks that implement inference. In particular, we show that
marginalization based on sum-product message passing can be
implemented in terms of reactions between chemical species
whose concentrations represent probabilities. We show
algebraically that the steady state concentration of these
species correspond to the marginal distributions of the random
variables in the graph and validate the results in
simulations. As with standard sum-product inference, this
procedure yields exact results for tree-structured graphs, and
approximate solutions for loopy graphs.