AMLab | Amsterdam Machine Learning Lab

Herke van Hoof

Associate professor
AMLab
Informatics Institute
University of Amsterdam
Science Park, Lab 42, L4.05

Personal page Google scholar Twitter

I am associate professor at the University of Amsterdam in the Netherlands. My group works on various aspects of modular reinforcement learning. Reinforcement learning is a very general framework, but the price of that generality is generally low data-efficiency. To address that, we investigate topics like exploiting modular structures, including hierarchical structures. Such structures allowd transferring knowledge between tasks and exploiting prior knowledge, to learn more with less data. We are furthermore interested in applying reinforcement learning to domains with structured states or actions, such as learning heuristics for combinatorial problem solving.

Selected Publications

ICAPS

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Kuric, D., Infante, G., Gómez, V., Jonsson, A., and Hoof, H.

In International Conference on Automated Planning and Scheduling Jul 2024

HTML PDF
TMLR

Reusable Options through Gradient-based Meta Learning

Kuric, David, and Hoof, Herke

Transactions on Machine Learning Research Jul 2023

HTML PDF
ICLR

Multi-Agent MDP Homomorphic Networks

Pol, Elise, Hoof, Herke, Oliehoek, Frans, and Welling, Max

In International Conference on Learning Representations Jul 2022

Abs HTML PDF

This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information, yet is able to share experience between global symmetries in the joint state-action space of cooperative multi-agent systems. In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations. For example, consider a group of agents navigating: rotating the state globally results in a permutation of the optimal joint policy. Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting, because such approaches rely on the global symmetry in the full state-action spaces, and these can result in correspondences across agents. To encode such symmetries while still allowing distributed execution we propose a factorization that decomposes global symmetries into local transformations. Our proposed factorization allows for distributing the computation that enforces global symmetries over local agents and local interactions. We introduce a multi-agent equivariant policy network based on this factorization. We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines.
IJCAI

Value Refinement Network (VRN)

Wöhlke, Jan, Schmitt, Felix, and Hoof, Herke

In International Joint Conference on Artificial Intelligence Jul 2022

HTML
IJCAI

Leveraging class abstraction for commonsense reinforcement learning via residual policy gradient methods

Höpner, Niklas, Tiddi, Ilaria, and Hoof, Herke

In International Joint Conference on Artificial Intelligence Jul 2022

HTML PDF
ICLR

Estimating Gradients for Discrete Random Variables by Sampling without Replacement

Kool, Wouter, Hoof, Herke, and Welling, Max

In International Conference on Learning Representations Jul 2020

Abs HTML PDF Code Video

We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement, which reduces variance as it avoids duplicate samples. We show that our estimator can be derived as the Rao-Blackwellization of three different estimators. Combining our estimator with REINFORCE, we obtain a policy gradient estimator and we reduce its variance using a built-in control variate which is obtained without additional model evaluations. The resulting estimator is closely related to other gradient estimators. Experiments with a toy problem, a categorical Variational Auto-Encoder and a structured prediction problem show that our estimator is the only estimator that is consistently among the best estimators in both high and low entropy settings.
ICML

Addressing function approximation error in actor-critic methods

Fujimoto, S., Hoof, H., and Meger, D.

In International Conference on Machine Learning Jul 2018

HTML PDF
ICML

An Inference-Based Policy Gradient Method for Learning Options

Smith, M., Hoof, H., and Pineau, J.

In International Conference on Machine Learning Jul 2018

HTML PDF
JMLR

Non-parametric Policy Search with Limited Information Loss

Van Hoof, H., Neumann, G., and Peters, J.

Journal of Machine Learning Research Jul 2017

HTML PDF