Author Archives: Thijs van Ommen

Talk by Taco Cohen

You are all cordially invited to the AMLab seminar on Tuesday March 14 at 16:00 in C3.163, where Taco Cohen will give a talk titled “Group Equivariant & Steerable CNNs”. Afterwards there are the usual drinks and snacks!

Abstract: Deep learning can be very effective, but typically requires large amounts of labelled data, which can be costly to collect. This is not only a major practical limitation to the applicability of deep learning, but also a fundamental barrier to AI: rapid learning is an essential part of intelligence.

In this talk I will present group equivariant networks, a natural generalization of convolutional networks that achieves improved statistical efficiency by exploiting symmetries like rotation and reflection. Instead of using convolutions, these networks use group equivariant convolutions. Group equivariant convolutions are easy to use, fast, and can be converted to standard convolutions after training. We show that simply replacing translational convolutions with group equivariant convolutions can improve image classification results. In the second part of the talk I will show how group equivariant nets can be scaled up to very large symmetry groups using steerable filters.

Talk by Karen Ullrich

You are all cordially invited to the AMLab seminar on Tuesday March 7 at 16:00 in C3.163, where Karen Ullrich will give a talk titled “Soft Weight-Sharing for Neural Network Compression”. Afterwards there are the usual drinks and snacks!

Abstract: The success of deep learning in numerous application domains created the desire to run and train them on mobile devices. This however, conflicts with their computationally, memory and energy intense nature, leading to a growing interest in compression. Recent work by Han et al. (2015a) propose a pipeline that involves retraining, pruning and quantization of neural network weights, obtaining state-of-the-art compression rates. In this paper, we show that competitive compression rates can be achieved by using a version of ”soft weight-sharing” (Nowlan & Hinton, 1992). Our method achieves both quantization and pruning in one simple (re-)training procedure. This point of view also exposes the relation between compression and the minimum description length (MDL) principle.

Talk by ChangYong Oh

You are all cordially invited to the AMLab seminar on Tuesday February 28 at 16:00 in C3.163, where ChangYong Oh will give a talk titled “High dimensional Bayesian Optimization”. Afterwards there are the usual drinks and snacks!

Abstract: Bayesian optimization has been successful in many hyper-parameter optimization problems and reinforcement learning problems. Still, there are many obstacles which prevent it from being extensively applied. Among many obstacles, we focused on the methods for high dimensional spaces. In order to resolve the difficulties of high dimensional Bayesian optimization problems, we devised a principled method to reduce the predictive variance of Gaussian process and other assistive methods for its successful application.
Firstly, brief explanation about general Bayesian optimization will be given. Secondly, I will explain the sources that make high dimensional problems harder, namely, ‘boundary effect’ and ‘hollow ball problem’. Thirdly, I will propose solutions to those problem, so-called, ‘variance reduction’ and ‘adaptive search region’.

Talk by Artem Grotov

You are all cordially invited to the AMLab seminar on Tuesday February 21 at 16:00 in C3.163, where Artem Grotov will give a talk titled “Deep Counterfactual Learning”. Afterwards there are the usual drinks and snacks!

Abstract: Deep learning is increasingly important for training interactive systems such as search engines and recommenders. They are applied to a broad range of tasks, including ranking, text similarity, and classification. Training neural network to perform classification requires a lot of labeled data. While collecting large supervised labeled data sets is expensive and sometimes impossible, for example for personalized tasks, there often is an abundance of logged data collected from user interactions with an existing system. This type of data is called logged bandit feedback and utilizing it is challenging because such data is noisy, biased and incomplete. We propose a learning method, Constrained Conterfactual Risk Minimisation (CCRM), based on counterfactual risk minimization of empirical Bernstein bound to tackle this problem and learn from logged bandit feedback. We evaluate CCRM on an image classification task. We find that CCRM performs well in practice and outperforms existing methods.

Talk by Jakub Tomczak

You are all cordially invited to the AMLab seminar on Tuesday February 14 at 16:00 in C3.163, where Jakub Tomczak will give a talk titled “Improving Variational Auto-Encoders using volume-preserving flows: A preliminary study”. Afterwards there are the usual drinks and snacks!

Abstract: Variational auto-encoders (VAE) are scalable and powerful generative models. However, the choice of the variational posterior determines tractability and flexibility of the VAE. Commonly, latent variables are modeled using the normal distribution with a diagonal covariance matrix. This results in computational efficiency but typically it is not flexible enough to match the true posterior distribution. One fashion of enriching the variational posterior distribution is application of normalizing flows, i.e., a series of invertible transformations to latent variables with a simple posterior. Application of general normalizing flows requires calculating the Jacobian-determinant that could be computationally troublesome. However, it is possible to design a series of transformations for which the Jacobian-determinant equals 1, so called volume-preserving flows. During the presentation I will describe my preliminary results on new volume-preserving flow called Householder flow and an extension of the linear Inverse Autoregressive Flow.

Slides (pdf)

Talk by Thijs van Ommen

You are all cordially invited to the AMLab seminar on Tuesday February 7 at 16:00 in C3.163, where Thijs van Ommen will give a talk titled “Recognizing linear structural equation models from observational data”. Afterwards there are the usual drinks and snacks!

Abstract: In a linear structural equation model, each variable is a linear function of other variables plus noise, and some noise terms may be correlated. Such a model can be represented by a mixed graph, with directed edges for causal relations and bidirected edges for correlated noise terms. Our goal is to learn the graph structure from observational data. To do this, we need to consider what constraints a model imposes on the observed covariance matrix. Some of these constraint do not correspond to (conditional) independences, and are not well understood. In particular, it is not even clear how to tell, by looking at two graphs, whether they impose exactly the same constraints. I will describe my progress in mapping out these models and their constraints.

Talk by Max Welling

You are all cordially invited to the AMLab seminar on Tuesday January 31 at 16:00 in C3.163, where Max Welling will give a talk titled “AMLAB/QUVA’s progress in Deep Learning”. Afterwards there are the usual drinks and snacks!

Abstract: I will briefly describe the progress that has been made in the past year in AMLAB and QUVA in terms of deep learning. I will try to convey a coherent story of how some of these projects tie together into a bigger vision for the field. I will end with research questions that seem interesting for future projects.

Talk by Marco Loog (TUD)

You are all cordially invited to the AMLab seminar on Tuesday January 24 at 16:00 in C3.163, where Marco Loog will give a talk titled “Semi-Supervision, Surrogate Losses, and Safety Guarantees”. Afterwards there are the usual drinks and snacks!

Abstract: Users of classification tools tend to forget [or worse, might not even realize] that classifiers typically do not minimize the 0-1 loss, but a surrogate that upperbounds the classification error on the training set.  Here we argue that we should also study these losses as such and we consider the problem of semi-supervised learning from this angle.  In particular, we look at the basic setting of linear classifiers and convex margin-based losses, e.g. hinge, logistic, squared, etc.  We investigate to what extent semi-supervision can be safe at least on the training set, i.e., we want to construct semi-supervised classifiers for which the empirical risk is never larger than the risk achieved by their supervised counterparts.  [Based on work carried out together with Jesse Krijthe; see and].

Talk by Thomas Kipf

You are all cordially invited to the AMLab seminar on Tuesday December 13 at 16:00 in C3.163, where Thomas Kipf will give a talk titled “Deep Learning on Graphs with Graph Convolutional Networks”. Afterwards there are the usual drinks and snacks!

Abstract: Deep learning has recently enabled breakthroughs in the fields of computer vision and natural language processing. Little attention, however, has been devoted to the generalization of deep neural network-based models to datasets that come in the form of graphs or networks (e.g. social networks, knowledge graphs or protein-interaction networks). Generalizing convolutional neural networks, the workhorse of deep learning, to graph-structured data is not straightforward and a number of different approaches have been introduced (see [1] for an overview). I will review some of these models and introduce our own variant of graph convolutional networks [2] that achieves competitive performance on a number of semi-supervised node classification tasks. I will further talk about extensions to the basic graph convolutional framework, with special focus on our recently introduced variational graph auto-encoder [3]—a model for unsupervised learning and link prediction—and outline future research directions.

[1] Graph Convolutional Networks,
[2] TN Kipf and M Welling, Semi-Supervised Classification with Graph Convolutional Networks, arXiv:1609.02907, 2016
[3] TN Kipf and M Welling, Variational Graph Auto-Encoders, NIPS Bayesian Deep Learning Workshop, 2016

Talk by Sara Magliacane

You are all cordially invited to the AMLab seminar on Tuesday November 29 at 16:00 in C3.163, where Sara Magliacane will give a talk titled “Ancestral Causal Inference”. Afterwards there are the usual drinks and snacks!

Abstract: This is a practice talk for a ~12 minutes general-audience talk at a NIPS workshop, so ideally it should require no previous knowledge on causality.

Discovering causal relations from data is at the foundation of the scientific method. Traditionally, cause-effect relations have been recovered from experimental data in which the variable of interest is perturbed, but seminal work like the do-calculus and the PC/FCI algorithms demonstrate that, under certain assumptions, it is already possible to obtain significant causal information by using only observational data.

Recently, there have been several proposals for combining observational and experimental data to discover causal relations. These causal discovery methods are usually divided into two categories: constraint-based and score-based methods. Score-based methods typically evaluate models using a penalized likelihood score, while constraint-based methods use statistical independences to express constraints over possible causal models. The advantages of constraint-based over score-based methods are the ability to handle latent confounders naturally, no need for parametric modeling assumptions and an easy integration of complex background knowledge, especially in the logic-based methods.

We propose Ancestral Causal Inference (ACI), a logic-based method that provides a comparable accuracy to the best state-of-the-art constraint-based methods, but improves on their scalability by using a more coarse-grained representation of causal information. Furthermore, we propose a method to score predictions according to their confidence. We provide some theoretical guarantees for ACI, like soundness and asymptotic consistency, and demonstrate that it can outperform the state-of-the-art on synthetic data, achieving a speedup of several orders of magnitude. We illustrate its practical feasibility by applying it on a challenging protein data set that so far had only been addressed with score-based methods.