AMLab | Amsterdam Machine Learning Lab

The Amsterdam Machine Learning Lab (AMLab) conducts research in machine learning, artificial intelligence, and its applications to large scale data domains in science and industry. This includes the development of deep generative models, methods for approximate inference, probabilistic programming, Bayesian deep learning, causal inference, reinforcement learning, graph neural networks, and geometric deep learning.

AMLab comprises 7 faculty. Jan-Willem van de Meent, who serves as director, Max Welling, Herke van Hoof, Patrick Forré, Erik Bekkers, Christian Naesseth, and Sara Magliacane. The lab participates in public-private partnerships with industry through the QUvA Lab (with Qualcomm) and the Delta Lab (with Bosch). The lab also engages in cross-disciplinary collaborations through the AI4Science Lab.

News

Nov 7, 2024	The AMLAB has an open postdoc position on support human decision making using reinforcement learning. You will be working with Herke van Hoof and Frans Oliehoek. Full details and instructions to apply can be found in the official vacancy. The position is part of the AI4REALNET project that receives funding from the European Union’s Horizon Europe programme, and the Hybrid Intelligence project that receives funding from the NWO.
May 7, 2024	Congratulations to Durk Kingma and Max Welling on receiving the inaugural test-of-time award at ICLR 2024!
Apr 26, 2024	The BeNeRL workshop on reinforcement learning will take place June 10th in Amsterdam! The programme includes keynotes by Frans Oliehoek, Roxana Radulescu, Thomas Moerland, Thiago Dias Simao, and Yailen Martinez Jimenez. The workshop is free to attend but registration is required. More information and registration via the workshop website. The workshop is supported by Ellis Amsterdam and NWO.
Jan 26, 2024	Sara Magliacane and Herke van Hoof have two open PhD positions on machine learning in the fintech domain (in collaboration with Adyen). One student will work on causal machine learning, and the second student will work on reinforcement learning. Deadline for applications is March 11th. For all details and how to apply, please see the official vacancy.
Jan 4, 2024	We currently have two postdoc openings at AMLab, both with deadline of Februari 4th: Christian Naesseth is hiring for a postdoc position focusing topics relating to generative AI, AI4Science, or uncertainty quantification. This position is part of the UvA Bosch Delta Lab. You can apply here. Jan-Willem van de Meent is hiring for a postdoc position in AI methods for sustainability, including Bayesian optimization and experiment design, data-efficient surrogate modeling, probabilistic programming, and simulation-based inference. This position is part of the ELiAS program. You can apply here.

Recent Publications

ICML

Controlled Generation with Equivariant Variational Flow Matching

Eijkelboom, Floor, Zimmermann, Heiko, Bekkers, Erik, Welling, Max, Naesseth, Christian, and van de Meent, Jan-Willem

In International Conference on Machine Learning 2025
ICML

Exponential Family Variational Flow Matching for Tabular Data Generation

Guzmán-Cordero*, Andrés, Eijkelboom*, Floor, and van de Meent, Jan-Willem

In International Conference on Machine Learning 2025
ICML

Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems

Zhdanov, Maksim, Welling, Max, and van de Meent, Jan-Willem

In International Conference on Machine Learning 2025

Abs HTML PDF

Large-scale physical systems defined on irregular grids pose significant scalability challenges for deep learning methods, especially in the presence of long-range interactions and multi-scale coupling. Traditional approaches that compute all pairwise interactions, such as attention, become computationally prohibitive as they scale quadratically with the number of nodes. We present Erwin, a hierarchical transformer inspired by methods from computational many-body physics, which combines the efficiency of tree-based algorithms with the expressivity of attention mechanisms. Erwin employs ball tree partitioning to organize computation, which enables linear-time attention by processing nodes in parallel within local neighborhoods of fixed size. Through progressive coarsening and refinement of the ball tree structure, complemented by a novel cross-ball interaction mechanism, it captures both fine-grained local details and global features. We demonstrate Erwin’s effectiveness across multiple domains, including cosmology, molecular dynamics, and particle fluid dynamics, where it consistently outperforms baseline methods both in accuracy and computational efficiency.
PRL

Learning Neural Free-Energy Functionals with Pair-Correlation Matching

Dijkman, Jacobus, Dijkstra, Marjolein, van Roij, René, Welling, Max, van de Meent, Jan-Willem, and Ensing, Bernd

Physical Review Letters Feb 2025

Abs HTML PDF

The intrinsic Helmholtz free-energy functional, the centerpiece of classical density functional theory, is at best only known approximately for 3D systems. Here we introduce a method for learning a neural-network approximation of this functional by exclusively training on a dataset of radial distribution functions, circumventing the need to sample costly heterogeneous density profiles in a wide variety of external potentials. For a supercritical Lennard-Jones system with planar symmetry, we demonstrate that the learned neural free-energy functional accurately predicts inhomogeneous density profiles under various complex external potentials obtained from simulations.
ICRA

On-Robot Reinforcement Learning with Goal-Contrastive Rewards

Biza, Ondrej, Weng, Thomas, Sun, Lingfeng, Schmeckpeper, Karl, Kelestemur, Tarik, Ma, Yecheng Jason, Platt, Robert, van de Meent, Jan-Willem, and Wong, Lawson L. S.

In Proceedings of the 2025 IEEE International Conference on Robotics and Automation, ICRA’25 Feb 2025

HTML PDF
ICLR

Approximating Full Conformal Prediction for Neural Network Regression with Gauss-Newton Influence

Tailor, Dharmesh, Correia, Alvaro, Nalisnick, Eric, and Louizos, Christos

In 13th International Conference on Learning Representations (to appear) Apr 2025

HTML
NeurIPS

Practical Shuffle Coding

Kunze, Julius, Severo, Daniel, van de Meent, Jan-Willem, and Townsend, James

In Advances in Neural Information Processing Systems Apr 2024

HTML PDF
NeurIPS

VISA: Variational Inference with Sequential Sample-Average Approximations

Zimmermann, Heiko, Naesseth, Christian A., and van de Meent, Jan-Willem

In Advances in Neural Information Processing Systems Apr 2024

HTML PDF
NAACL

Towards Reducing Diagnostic Errors with Interpretable Risk Prediction

McInerney, Denis Jered, Dickinson, William, Flynn, Lucy C., Young, Andrea C., Young, Geoffrey S., van de Meent, Jan-Willem, and Wallace, Byron C.

In 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) Apr 2024

Abs HTML PDF

Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual "true" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.

CIKM

Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits

Mansoury, Masoud, Mobasher, Bamshad, and Hoof, Herke

In ACM International Conference on Information and Knowledge Management Apr 2024

arXiv Bib HTML

@inproceedings{mansoury2024mitigating,
  title = {Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits},
  author = {Mansoury, Masoud and Mobasher, Bamshad and van Hoof, Herke},
  year = {2024},
  date = {2024-10-21},
  booktitle = {ACM International Conference on Information and Knowledge Management},
  abbr = {CIKM},
  html = {https://dl.acm.org/doi/10.1145/3627673.3679763},
  bibtex_show = {true},
  arxiv = {https://arxiv.org/abs/2408.04332}
}

SIGIR

Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems

Huang, Jin, Oosterhuis, Harrie, Mansoury, Masoud, Hoof, Herke, and Rijke, Maarten

In International ACM SIGIR Conference on Research and Development in Information Retrieval Apr 2024

arXiv Bib HTML

@inproceedings{huang2024going,
  title = {Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems},
  author = {Huang, Jin and Oosterhuis, Harrie and Mansoury, Masoud and van Hoof, Herke and de Rijke, Maarten},
  url = {https://arxiv.org/abs/2404.18640},
  year = {2024},
  date = {2024-07-14},
  booktitle = {International ACM SIGIR Conference on Research and Development in Information Retrieval},
  abbr = {SIGIR},
  html = {https://dl.acm.org/doi/10.1145/3626772.3657749},
  bibtex_show = {true},
  arxiv = {https://arxiv.org/abs/2404.18640}
}

NeurIPS

Fast yet Safe: Early-Exiting with Risk Control

Jazbec*, Metod, Timans*, Alexander, Veljković, Tin Hadži, Sakmann, Kaspar, Zhang, Dan, Naesseth, Christian A, and Nalisnick, Eric

Advances in Neural Information Processing Systems Dec 2024

arXiv Bib

@article{jazbec2024fastyetsafe,
  title = {Fast yet Safe: Early-Exiting with Risk Control},
  author = {Jazbec*, Metod and Timans*, Alexander and Veljkovi{\'c}, Tin Had{\v{z}}i and Sakmann, Kaspar and Zhang, Dan and Naesseth, Christian A and Nalisnick, Eric},
  journal = {Advances in Neural Information Processing Systems},
  volume = {37},
  year = {2024},
  month = dec,
  abbr = {NeurIPS},
  annotation = {* Equal contribution},
  arxiv = {2405.20915},
  bibtex_show = {true}
}

NeurIPS

Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling

Bartosh, Grigory, Vetrov, Dmitry, and Naesseth, Christian A

Advances in Neural Information Processing Systems Dec 2024

arXiv Bib

@article{bartosh2024neural,
  title = {Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling},
  author = {Bartosh, Grigory and Vetrov, Dmitry and Naesseth, Christian A},
  journal = {Advances in Neural Information Processing Systems},
  volume = {37},
  year = {2024},
  month = dec,
  abbr = {NeurIPS},
  arxiv = {2404.12940},
  bibtex_show = {true}
}

NeurIPS

Variational Flow Matching for Graph Generation

Eijkelboom*, Floor, Bartosh*, Grigory, Naesseth, Christian Andersson, Welling, Max, and Meent, Jan-Willem

Advances in Neural Information Processing Systems Dec 2024

arXiv Bib

@article{eijkelboom2024variational,
  title = {Variational Flow Matching for Graph Generation},
  author = {Eijkelboom*, Floor and Bartosh*, Grigory and Naesseth, Christian Andersson and Welling, Max and van de Meent, Jan-Willem},
  journal = {Advances in Neural Information Processing Systems},
  volume = {37},
  year = {2024},
  month = dec,
  abbr = {NeurIPS},
  annotation = {* Equal contribution},
  arxiv = {2406.04843},
  bibtex_show = {true}
}

NeurIPS

Equivariant Neural Diffusion for Molecule Generation

Cornet, François RJ, Bartosh, Grigory, Schmidt, Mikkel N, and Naesseth, Christian A

Advances in Neural Information Processing Systems Dec 2024

Bib HTML

@article{cornet2024equivariant,
  title = {Equivariant Neural Diffusion for Molecule Generation},
  author = {Cornet, Fran{\c{c}}ois RJ and Bartosh, Grigory and Schmidt, Mikkel N and Naesseth, Christian A},
  journal = {Advances in Neural Information Processing Systems},
  volume = {37},
  year = {2024},
  month = dec,
  abbr = {NeurIPS},
  html = {https://openreview.net/forum?id=3iih8PGAH7},
  bibtex_show = {true}
}

ICML

Neural Diffusion Models

Bartosh, Grigory, Vetrov, Dmitry, and Naesseth, Christian A

The 41st International Conference on Machine Learning (ICML) Jul 2024

arXiv Bib

@article{bartosh2023neural,
  title = {Neural Diffusion Models},
  author = {Bartosh, Grigory and Vetrov, Dmitry and Naesseth, Christian A},
  journal = {The 41st International Conference on Machine Learning (ICML)},
  year = {2024},
  month = jul,
  abbr = {ICML},
  arxiv = {2310.08337},
  bibtex_show = {true}
}

UAI

Early-Exit Neural Networks with Nested Prediction Sets

Jazbec, Metod, Forré, Patrick, Mandt, Stephan, Zhang, Dan, and Nalisnick, Eric

The 40th Conference on Uncertainty in Artificial Intelligence Aug 2024

arXiv Bib

@article{jazbec2024nestedeenns,
  title = {Early-Exit Neural Networks with Nested Prediction Sets},
  author = {Jazbec, Metod and Forr{\'e}, Patrick and Mandt, Stephan and Zhang, Dan and Nalisnick, Eric},
  journal = {The 40th Conference on Uncertainty in Artificial Intelligence},
  year = {2024},
  month = aug,
  abbr = {UAI},
  arxiv = {2311.05931},
  bibtex_show = {true}
}

ICAPS

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Kuric, D., Infante, G., Gómez, V., Jonsson, A., and Hoof, H.

In International Conference on Automated Planning and Scheduling Aug 2024

HTML PDF
AAMAS

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Loftin, Robert, Çelikok, Mustafa Mert, Hoof, Herke, Kaski, Samuel, and Oliehoek, Frans

In Artificial Agents and Multi-Agent Systems (AAMAS) Aug 2024

HTML PDF
AISTATS

Learning to Defer to a Population: A Meta-Learning Approach

Tailor, Dharmesh, Patra, Aditya, Verma, Rajeev, Manggala, Putra, and Nalisnick, Eric

In 27th International Conference on Artificial Intelligence and Statistics May 2024

HTML
ICLR

Entropy Coding of Unordered Data Structures

Kunze, Julius, Severo, Daniel, Zani, Giulio, van de Meent, Jan-Willem, and Townsend, James

In International Conference on Learning Representations (ICLR) May 2024
ECML

Learning Hierarchical Planning-Based Policies from Offline Data

Woehlke, J., Schmitt, F., and Hoof, H.

In Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD) May 2023

HTML
EMNLP

CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models

McInerney, Denis Jered, Young, Geoffrey, Meent, Jan-Willem, and Wallace, Byron

In The 2023 Conference on Empirical Methods in Natural Language Processing (to appear) May 2023
EMNLP

Aligning Predictive Uncertainty with Clarification Questions in Grounded Dialog

Naszadi, Kata, Manggala, Putra, and Monz, Christof

In The 2023 Conference on Empirical Methods in Natural Language Processing (to appear) Dec 2023
NeurIPS

Implicit Neural Convolutional Kernels for Steerable CNNs

Zhdanov, Maksim, Hoffmann, Nico, and Cesa, Gabriele

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Flow Factorzied Representation Learning

Song, Yue, Keller, T Anderson, Sebe, Nicu, and Welling, Max

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Rotating Features for Object Discovery

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Latent Field Discovery in Interacting Dynamical Systems with Neural Fields

Kofinas, Miltiadis, Bekkers, Erik J, Nagaraja, Naveen Shankar, and Gavves, Efstratios

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Towards Anytime Classification in Early-Exit Architectures by Enforcing Conditional Monotonicity

Jazbec, Metod, Allingham, James Urquhart, Zhang, Dan, and Nalisnick, Eric

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning

Feng, Fan, and Magliacane, Sara

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Invariant Neural Ordinary Differential Equations

Auzina, Ilze Amanda, Yıldız, Çağatay, Magliacane, Sara, Bethge, Matthias, and Gavves, Efstratios

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Clifford group equivariant neural networks

Ruhe, David, Brandstetter, Johannes, and Forré, Patrick

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems

Lippert, Fiona, Kranstauber, Bart, Loon, E Emiel, and Forré, Patrick

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

Wu, Luhuan, Trippe, Brian L, Naesseth, Christian A, Blei, David M, and Cunningham, John P

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023
NeurIPS

Topological Obstructions and How to Avoid Them

Esmaeili, Babak, Walters, Robin, Zimmermann, Heiko, and van de Meent, Jan-Willem

In Thirty-seventh Conference on Neural Information Processing Systems (to appear) Dec 2023

HTML
NeurIPS

The Memory-Perturbation Equation: Understanding Model’s Sensitivity to Data

Nickl, Peter, Xu, Lu*, Tailor, Dharmesh*, Möllenhoff, Thomas, and Khan, Mohammad Emtiyaz

In Thirty-seventh Conference on Neural Information Processing Systems Dec 2023

HTML
CoRL

One-shot Imitation Learning via Interaction Warping

Biza, Ondrej, Thompson, Skye, Pagidi, Kishore Reddy, Kumar, Abhinav, Pol, Elise, Walters, Robin, Kipf, Thomas, Meent, Jan-Willem, Wong, Lawson L.S., and Platt, Robert

In 7th Annual Conference on Robot Learning Nov 2023

HTML
UAI

Exploiting Inferential Structure in Neural Processes

Tailor, Dharmesh, Khan, Mohammad Emtiyaz, and Nalisnick, Eric

In The 39th Conference on Uncertainty in Artificial Intelligence Aug 2023

HTML
ACT

String Diagrams with Factorized Densities

Sennesh, Eli, and van de Meent, Jan-Willem

In Applied Category Theory Jul 2023

Abs HTML PDF

A growing body of research on probabilistic programs and causal models has highlighted the need to reason compositionally about model classes that extend directed graphical models. Both probabilistic programs and causal models define a joint probability density over a set of random variables, and exhibit sparse structure that can be used to reason about causation and conditional independence. This work builds on recent work on Markov categories of probabilistic mappings to define a category whose morphisms combine a joint density, factorized over each sample space, with a deterministic mapping from samples to return values. This is a step towards closing the gap between recent category-theoretic descriptions of probability measures, and the operational definitions of factorized densities that are commonly employed in probabilistic programming and causal inference.
TMLR

Reusable Options through Gradient-based Meta Learning

Kuric, David, and Hoof, Herke

Transactions on Machine Learning Research Mar 2023

HTML PDF Code Video
TMLR

A Variational Perspective on Generative Flow Networks

Zimmermann, Heiko, Lindsten, Fredrik, Meent, Jan-Willem, and Naesseth, Christian A

Transactions on Machine Learning Research Apr 2023

Abs HTML PDF Code

Generative flow networks (GFNs) are a class of probabilistic models for sequential sampling of composite objects, proportional to a target distribution that is defined in terms of an energy function or a reward. GFNs are typically trained using a flow matching or trajectory balance objective, which matches forward and backward transition models over trajectories. In this work we introduce a variational objective for training GFNs, which is a convex combination of the reverse- and forward KL divergences, and compare it to the trajectory balance objective when sampling from the forward- and backward model, respectively. We show that, in certain settings, variational inference for GFNs is equivalent to minimizing the trajectory balance objective, in the sense that both methods compute the same score-function gradient. This insight suggests that in these settings, control variates, which are commonly used to reduce the variance of score-function gradient estimates, can also be used with the trajectory balance objective. We evaluate our findings and the performance of the proposed variational objective numerically by comparing it to the trajectory balance objective on two synthetic tasks.
ICLR

Bridge the Inference Gaps of Neural Processes via Expectation Maximization

Wang, Qi, Federici, Marco, and Hoof, Herke

In International Conference on Learning Representations Apr 2023

HTML
ICLR

Sampling-Based Inference for Large Linear Models, with Application to Linearised Laplace

Antorán, Javier, Padhy, Shreyas, Barbano, Riccardo, Nalisnick, Eric, Janz, David, and Miguel Hernández-Lobato, José

In International Conference on Learning Representations Apr 2023

HTML
AISTATS

Do Bayesian Neural Networks Need To Be Fully Stochastic?

Sharma, Mrinank, Farquhar, Sebastian, Nalisnick, Eric, and Rainforth, Tom

In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics Apr 2023

HTML
AISTATS

Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles

Verma, Rajeev, Barrejón, Daniel, and Nalisnick, Eric

In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics Apr 2023

HTML
ECML

Learning objective-specific active learning strategies with Attentive Neural Processes

Bakker, Tim, Hoof, Herke, and Welling, Max

In Proceedings of the European Conference on Machine Learning Sep 2023

PDF Code
NeurIPS
Workshop

Active Learning Policies for Solving Inverse Problems

Bakker, T., Hehn, T., Orekondy, T., Behboodi, A., and Massoli, F. Valerio

In Neural Information Processing Systems Workshop on Adaptive Experimental Design and Active Learning in the Real World Dec 2023

PDF
NeurIPS
Workshop

Switching policies for solving inverse problems

Bakker, T., Massoli, F. Valerio, Hehn, T., Orekondy, T., and Behboodi, A.

In Neural Information Processing Systems Workshop on Deep Learning and Inverse Problems Dec 2023

PDF
PLOS Comp Bio

Probabilistic Program Inference in Network-Based Epidemiological Simulations

Smedemark-Margulies, Niklas, Walters, Robin, Zimmermann, Heiko, Laird, Lucas, Loo, Christian, Kaushik, Neela, Caceres, Rajmonda, and Meent, Jan-Willem

PLOS Computational Biology Nov 2022

Abs HTML PDF

Accurate epidemiological models require parameter estimates that account for mobility patterns and social network structure. We demonstrate the effectiveness of probabilistic programming for parameter inference in these models. We consider an agent-based simulation that represents mobility networks as degree-corrected stochastic block models, whose parameters we estimate from cell phone co-location data. We then use probabilistic program inference methods to approximate the distribution over disease transmission parameters conditioned on reported cases and deaths. Our experiments demonstrate that the resulting models improve the quality of fit in multiple geographies relative to baselines that do not model network topology.
IJCNN

Logic-based AI for Interpretable Board Game Winner Prediction with Tsetlin Machine

Giri, Charul, Granmo, Ole-Christopher, Hoof, Herke, and Blakely, Christian D.

In International Joint Conference on Neural Networks Nov 2022

PDF
NeurIPS

Learning Expressive Meta-Representations with Mixture of Expert Neural Processes

Wang, Qi, and Hoof, Herke

In Advances in Neural Information Processing Systems Nov 2022

Abs

Neural processes (NPs) formulate exchangeable stochastic processes and are promising models for meta learning that do not require gradient updates during the testing phase. However, most NP variants place a strong emphasis on a global latent variable. This weakens the approximation power and restricts the scope of applications using NP variants, especially when data generative processes are complicated. To resolve these issues, we propose to combine the Mixture of Expert models with Neural Processes to develop more expressive exchangeable stochastic processes, referred to as Mixture of Expert Neural Processes (MoE-NPs). Then we apply MoE-NPs to both few-shot supervised learning and meta reinforcement learning tasks. Empirical results demonstrate MoE-NPs’ strong generalization capability to unseen tasks in these benchmarks.
NeurIPS

Factored Adaptation for Non-Stationary Reinforcement Learning

Feng, Fan, Huang, Biwei, Zhang, Kun, and Magliacane, Sara

In Advances in Neural Information Processing Systems Nov 2022

Abs HTML PDF Code

Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of return, compactness of the latent state representation, and robustness to varying degrees of non-stationarity.
NeurIPS

Neural Topological Ordering for Computation Graphs

Gagrani, Mukul, Rainone, Corrado, Yang, Yang, Teague, Harris, Jeon, Wonseok, Hoof, Herke, Zeng, Weiliang Will, Zappi, Piero, Lott, Christopher, and Bondesan, Roberto

In Advances in Neural Information Processing Systems Nov 2022

HTML PDF
ICML

Equivariant diffusion for molecule generation in 3d

Hoogeboom, Emiel, Satorras, Vı́ctor Garcia, Vignac, Clément, and Welling, Max

In International Conference on Machine Learning Nov 2022

Abs HTML PDF Code

This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
ICML

Lie Point Symmetry Data Augmentation for Neural PDE Solvers

Brandstetter, Johannes, Welling, Max, and Worrall, Daniel E

In International Conference on Machine Learning Nov 2022

Abs HTML PDF Code

Neural networks are increasingly being used to solve partial differential equations (PDEs), replacing slower numerical solvers. However, a critical issue is that neural PDE solvers require high-quality ground truth data, which usually must come from the very solvers they are designed to replace. Thus, we are presented with a proverbial chicken-and-egg problem. In this paper, we present a method, which can partially alleviate this problem, by improving neural PDE solver sample complexity – Lie point symmetry data augmentation (LPSDA). In the context of PDEs, it turns out that we are able to quantitatively derive an exhaustive list of data transformations, based on the Lie point symmetry group of the PDEs in question, something not possible in other application areas. We present this framework and demonstrate how it can easily be deployed to improve neural PDE solver sample complexity by an order of magnitude.
ICML

Exploiting Redundancy: Separable Group Convolutional Networks on Lie Groups

Knigge, David M, Romero, David W, and Bekkers, Erik J

International Conference on Machine Learning Nov 2022

Abs HTML PDF Code

Group convolutional neural networks (G-CNNs) have been shown to increase parameter efficiency and model accuracy by incorporating geometric inductive biases. In this work, we investigate the properties of representations learned by regular G-CNNs, and show considerable parameter redundancy in group convolution kernels. This finding motivates further weight-tying by sharing convolution kernels over subgroups. To this end, we introduce convolution kernels that are separable over the subgroup and channel dimensions. In order to obtain equivariance to arbitrary affine Lie groups we provide a continuous parameterisation of separable convolution kernels. We evaluate our approach across several vision datasets, and show that our weight sharing leads to improved performance and computational efficiency. In many settings, separable G-CNNs outperform their non-separable counterpart, while only using a fraction of their training time. In addition, thanks to the increase in computational efficiency, we are able to implement G-CNNs equivariant to the Sim(2) group; the group of dilations, rotations and translations. Sim(2)-equivariance further improves performance on all tasks considered.
ICML

CITRIS: Causal Identifiability from Temporal Intervened Sequences

Lippe, Phillip, Magliacane, Sara, Löwe, Sindy, Asano, Yuki M, Cohen, Taco, and Gavves, Efstratios

International Conference on Machine Learning Nov 2022

Abs HTML PDF Code

Understanding the latent causal factors of a dynamical system from visual observations is a crucial step towards agents reasoning in complex environments. In this paper, we propose CITRIS, a variational autoencoder framework that learns causal representations from temporal sequences of images in which underlying causal factors have possibly been intervened upon. In contrast to the recent literature, CITRIS exploits temporality and observing intervention targets to identify scalar and multidimensional causal factors, such as 3D rotation angles. Furthermore, by introducing a normalizing flow, CITRIS can be easily extended to leverage and disentangle representations obtained by already pretrained autoencoders. Extending previous results on scalar causal factors, we prove identifiability in a more general setting, in which only some components of a causal factor are affected by interventions. In experiments on 3D rendered image sequences, CITRIS outperforms previous methods on recovering the underlying causal variables. Moreover, using pretrained autoencoders, CITRIS can even generalize to unseen instantiations of causal factors, opening future research areas in sim-to-real generalization for causal representation learning.
ICML

Learning Symmetric Embeddings for Equivariant World Models

Park, Jung Yeon, Biza, Ondrej, Zhao, Linfeng, Meent, Jan-Willem, and Walters, Robin

In International Conference on Machine Learning Nov 2022

Abs PDF

Incorporating symmetries can lead to highly data-efficient and generalizable models by defining equivalence classes of data samples related by transformations. However, characterizing how transformations act on input data is often difficult, limiting the applicability of equivariant models. We propose learning symmetric embedding networks (SENs) that encode an input space (e.g. images), where we do not know the effect of transformations (e.g. rotations), to a feature space that transforms in a known manner under these operations. This network can be trained end-to-end with an equivariant task network to learn an explicitly symmetric representation. We validate this approach in the context of equivariant transition models with 3 distinct forms of symmetry. Our experiments demonstrate that SENs facilitate the application of equivariant networks to data with complex symmetry representations. Moreover, doing so can yield improvements in accuracy and generalization relative to both fully-equivariant and non-equivariant baselines.
ICML

Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

Antoran, Javier, Janz, David, Allingham, James Urquhart, Daxberger, Erik, Barbano, Riccardo, Nalisnick, Eric, and Hernandez-Lobato, Jose Miguel

In Proceedings of the 39th International Conference on Machine Learning Nov 2022

Abs HTML PDF

The linearised Laplace method for estimating model uncertainty has received renewed attention in the Bayesian deep learning community. The method provides reliable error bars and admits a closed-form expression for the model evidence, allowing for scalable selection of model hyperparameters. In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. We show that these interact poorly with some now-standard tools of deep learning–stochastic approximation methods and normalisation layers–and make recommendations for how to better adapt this classic method to the modern setting. We provide theoretical support for our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers.
ICML

Calibrated Learning to Defer with One-vs-All Classifiers

Verma, Rajeev, and Nalisnick, Eric

In Proceedings of the 39th International Conference on Machine Learning Nov 2022

Abs HTML PDF

The learning to defer (L2D) framework has the potential to make AI systems safer. For a given input, the system can defer the decision to a human if the human is more likely than the model to take the correct action. We study the calibration of L2D systems, investigating if the probabilities they output are sound. We find that Mozannar & Sontag’s (2020) multiclass framework is not calibrated with respect to expert correctness. Moreover, it is not even guaranteed to produce valid probabilities due to its parameterization being degenerate for this purpose. We propose an L2D system based on one-vs-all classifiers that is able to produce calibrated probabilities of expert correctness. Furthermore, our loss function is also a consistent surrogate for multiclass L2D, like Mozannar & Sontag’s (2020). Our experiments verify that not only is our system calibrated, but this benefit comes at no cost to accuracy. Our model’s accuracy is always comparable (and often superior) to Mozannar & Sontag’s (2020) model’s in tasks ranging from hate speech detection to galaxy classification to diagnosis of skin lesions.
ICML

Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models and Amortized Policy Search

Wang, Q., and Hoof, H.

In International Conference on Machine Learning Nov 2022

Abs PDF

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework and propose a new posterior sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), achieves competitive dynamics prediction performance with lower model complexity. Moreover, our approach in policy search is able to obtain high returns and allows fast execution by avoiding test-time policy gradient updates.
CLeaR

Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data

Löwe, S., Madras, D., Zemel, R., and Welling, M.

Causal Learning and Reasoning Nov 2022

Abs HTML PDF Code

On time-series data, most causal discovery methods fit a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information which is lost when following this approach. Specifically, different samples may share the dynamics which describe the effects of their causal relations. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus leverages the shared dynamics information. We demonstrate experimentally that this approach, implemented as a variational model, leads to significant improvements in causal discovery performance, and show how it can be extended to perform well under added noise and hidden confounding.
ICLR

Geometric and Physical Quantities improve E (3) Equivariant Message Passing

Brandstetter, Johannes, Hesselink, Rob, Pol, Elise, Bekkers, Erik, and Welling, Max

In International Conference on Learning Representations Nov 2022

Abs HTML PDF Code

Including covariant information, such as position, force, velocity or spin is important in many tasks in computational physics and chemistry. We introduce Steerable E(3) Equivariant Graph Neural Networks (SEGNNs) that generalise equivariant graph networks, such that node and edge attributes are not restricted to invariant scalars, but can contain covariant information, such as vectors or tensors. This model, composed of steerable MLPs, is able to incorporate geometric and physical information in both the message and update functions. Through the definition of steerable node attributes, the MLPs provide a new class of activation functions for general use with steerable feature fields. We discuss ours and related work through the lens of equivariant non-linear convolutions, which further allows us to pin-point the successful components of SEGNNs: non-linear message aggregation improves upon classic linear (steerable) point convolutions; steerable messages improve upon recent equivariant graph networks that send invariant messages. We demonstrate the effectiveness of our method on several tasks in computational physics and chemistry and provide extensive ablation studies.
ICLR

Self-Supervised Inference in State-Space Models

Ruhe, David, and Forré, Patrick

In International Conference on Learning Representations Nov 2022

Abs HTML PDF

We perform approximate inference in state-space models with nonlinear state transitions. Without parameterizing a generative model, we apply Bayesian update formulas using a local linearity approximation parameterized by neural networks. It comes accompanied by a maximum likelihood objective that requires no supervision via uncorrupt observations or ground truth latent states. The optimization backpropagates through a recursion similar to the classical Kalman filter and smoother. Additionally, using an approximate conditional independence, we can perform smoothing without having to parameterize a separate model. In scientific applications, domain knowledge can give a linear approximation of the latent transition maps, which we can easily incorporate into our model. Usage of such domain knowledge is reflected in excellent results (despite our model’s simplicity) on the chaotic Lorenz system compared to fully supervised and variational inference methods. Finally, we show competitive results on an audio denoising experiment.
ASCOM

Detecting dispersed radio transients in real time using convolutional neural networks

Ruhe, David, Kuiack, Mark, Rowlinson, Antonia, Wijers, Ralph, and Forré, Patrick

Astronomy and Computing Nov 2022

HTML PDF
ICLR

Multi-Agent MDP Homomorphic Networks

Pol, Elise, Hoof, Herke, Oliehoek, Frans, and Welling, Max

In International Conference on Learning Representations Nov 2022

Abs HTML PDF

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations. For example, consider a group of agents navigating: rotating the state globally results in a permutation of the optimal joint policy. Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting, because such approaches rely on the global symmetry in the full state-action spaces, and these can result in correspondences across agents. To encode such symmetries while still allowing distributed execution we propose a factorization that decomposes global symmetries into local transformations. Our proposed factorization allows for distributing the computation that enforces global symmetries over local agents and local interactions. We introduce a multi-agent equivariant policy network based on this factorization. We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines.
CPAIOR

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool, Wouter, Hoof, Herke, Gromicho, Joaquim, and Welling, Max

In International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research Nov 2022

PDF Code
AAAI

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Long, Alex, Blair, Alan, and Hoof, Herke

In AAAI National Conference on Artificial Intelligence Nov 2022

HTML PDF
IJCAI

Value Refinement Network (VRN)

Wöhlke, Jan, Schmitt, Felix, and Hoof, Herke

In International Joint Conference on Artificial Intelligence Nov 2022

HTML
IJCAI

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

Höpner, Niklas, Tiddi, Ilaria, and Hoof, Herke

In International Joint Conference on Artificial Intelligence Nov 2022

HTML PDF
MIDL

On learning adaptive acquisition policies for undersampled multi-coil MRI reconstruction

Bakker, T., Muckley, M., Romero-Soriano, A., Drozdzal, M., and Pineda, L.

In Proceedings of Machine Learning Research Jul 2022

PDF Code

Selected Publications

NeurIPS

MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Pol, Elise, Worrall, Daniel, Hoof, Herke, Oliehoek, Frans, and Welling, Max

In Advances in Neural Information Processing Systems 2020

HTML PDF
ICLR

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

Kool, Wouter, Hoof, Herke, and Welling, Max

In International Conference on Learning Representations 2020

Abs HTML PDF Code Video

We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
ICLR

B-Spline CNNs on Lie groups

Bekkers, Erik J

In International Conference on Learning Representations 2019

Abs HTML PDF Code Video

Group convolutional neural networks (G-CNNs) can be used to improve classical CNNs by equipping them with the geometric structure of groups. Central in the success of G-CNNs is the lifting of feature maps to higher dimensional disentangled representations, in which data characteristics are effectively learned, geometric data-augmentations are made obsolete, and predictable behavior under geometric transformations (equivariance) is guaranteed via group theory. Currently, however, the practical implementations of G-CNNs are limited to either discrete groups (that leave the grid intact) or continuous compact groups such as rotations (that enable the use of Fourier theory). In this paper we lift these limitations and propose a modular framework for the design and implementation of G-CNNs for arbitrary Lie groups. In our approach the differential structure of Lie groups is used to expand convolution kernels in a generic basis of B-splines that is defined on the Lie algebra. This leads to a flexible framework that enables localized, atrous, and deformable convolutions in G-CNNs by means of respectively localized, sparse and non-uniform B-spline expansions. The impact and potential of our approach is studied on two benchmark datasets: cancer detection in histopathology slides (PCam dataset) in which rotation equivariance plays a key role and facial landmark localization (CelebA dataset) in which scale equivariance is important. In both cases, G-CNN architectures outperform their classical 2D counterparts and the added value of atrous and localized group convolutions is studied in detail.
AISTATS

Structured Disentangled Representations

Esmaeili, Babak, Wu, Hao, Jain, Sarthak, Bozkurt, Alican, Siddharth, N., Paige, Brooks, Brooks, Dana H., Dy, Jennifer, and van de Meent, Jan-Willem

Artificial Intelligence and Statistics 2019

Abs HTML PDF

Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors.
ICLR

Do Deep Generative Models Know What They Don’t Know?

Nalisnick, Eric, Matsukawa, Akihiro, Teh, Yee Whye, Gorur, Dilan, and Lakshminarayanan, Balaji

In International Conference on Learning Representations 2019

HTML PDF
ESWC

Modeling Relational Data with Graph Convolutional Networks

Schlichtkrull, Michael, Kipf, Thomas N., Bloem, Peter, Berg, Rianne, Titov, Ivan, and Welling, Max

In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece 2018

Abs HTML PDF

Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to handle the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.
NeurIPS

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Siddharth, N., Paige, Brooks, Meent, Jan-Willem, Desmaison, Alban, Goodman, Noah D., Kohli, Pushmeet, Wood, Frank, and Torr, Philip

In Advances in Neural Information Processing Systems 30 2017

Abs HTML PDF Code

Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework’s ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets.
ICLR

Semi-supervised classification with graph convolutional networks

Kipf, Thomas N, and Welling, Max

In International Conference on Learning Representations 2017

HTML PDF
ICML

Group Equivariant Convolutional Networks

Cohen, Taco, and Welling, Max

In Proceedings of The 33rd International Conference on Machine Learning 20–22 jun 2016

Abs HTML PDF

We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST.
NeurIPS

Improved Variational Inference with Inverse Autoregressive Flow

Kingma, Durk P, Salimans, Tim, Jozefowicz, Rafal, Chen, Xi, Sutskever, Ilya, and Welling, Max

In Advances in Neural Information Processing Systems 20–22 jun 2016

HTML PDF
NeurIPS

Semi-Supervised Learning with Deep Generative Models

Kingma, Durk P, Mohamed, Shakir, Jimenez Rezende, Danilo, and Welling, Max

In Advances in Neural Information Processing Systems 20–22 jun 2014

HTML PDF
ICLR

Auto-Encoding Variational Bayes

Kingma, Diederik P., and Welling, Max

20–22 jun 2013

HTML PDF
ICML

Bayesian learning via stochastic gradient langevin dynamics

Welling, Max, and Teh, Yee Whye

In Proceedings of the 28th International Conference on Machine Learning, ICML 2011 20–22 jun 2011

HTML PDF Video