decisions and improve data efficiency, with applications for example in the medical domain. ]]>

**Abstract**: In this talk, we will introduce our research work in the “playful data-driven active urban living” (PAUL) project. Targeting on physical inactivity issue in modern society, we aim to motive less active people to participate in more physical activity. Given an overview of this project, we will mainly present recent papers.

Driven by a large-scale dataset of Dutch people’s running records (over 10K people in about 4 years), we start with characterizing runners based on their different temporal activity patterns. Then, in respect of diverse users, we studied how environmental situations (time, weather, geographical and social information) at the start time of a run affect the running distance. A rule-based machine learning method is applied to capture combined situations frequently associated with relevant long-distance runs. These environmental situations are going to be used in a mobile system, to identify the ‘right timing’ for motivating people to start longer-distance runs via message interventions.

**Abstract**: In this talk we will present main results of two of our recent papers.

We will introduce a flexible class of general structural causal models that allow for non-/linear functional relations (like neural networks, etc.), arbitrary probability distributions (like discrete, continuous, mixtures, etc.), causal cycles (like feedback, etc.) and latent variables (aka confounders). For such models we will demonstrate several desirable properties, how to do causal reasoning, the rules of do-calculus and graphical criteria for conditional independence relations. We will also show how the latter can be exploited for causal discovery algorithms in such general context.

**Abstract**: In this highly informal seminar I would like to pitch the question “Can a machine learning system develop a theory?” One of the much-touted properties of deep learning networks is that their deeper levels develop higher order generalization representations of their inputs. This begs the question whether they are able to hit upon the type of hidden structures in physical problem that are the cornerstone of effective physical theories. I would like to propose to test this idea in a concrete setting related to the highly relevant question of inverse design of self-assembling matter. I have recently formulated a novel approach towards inferring the specific short range isotropic interactions between particles of multiple types on lattices of given geometry in order that they spontaneously form specified periodic states of essentially arbitrary complexity. This approach rests upon the subtle intertwining between the group of transformations that leave the lattice structure invariant, with the group of permutations in the set of particle types induced by these same transformations on the target ordered structure. The upshot of this approach is that the number of independent coupling constants in the lattice can be systematically reduced from O(N^{2}), where N is the number of distinct species, to O(N). The idea would be to see whether a machine learning approach which uses the space of possible patterns and their trivial transforms under symmetry operations as input, the set of possible constants as outputs, and feedback based on the degree to which the target structure is realized with these coupling constants is able to “learn” the symmetry-based rules, in a way that also generalizes to similar patterns not included in the training set.

**Abstract**:

The aim of the DeeBMED project was to develop a powerful automatic medical imaging tool that can cope with main problems associated with complex images like medical scans, namely, multimodality of data distribution, large number of dimension and small number of examples, small amount of labeled data, multi-source learning, and robustness to transformations. In order to counteract these issues I have proposed to use a probabilistic framework, namely, the Variational Auto-Encoder (VAE), that combines deep learning and Bayesian inference. Within the project I have followed two lines of research:

– Development of the VAE by:

* enriching the encoder (Householder flow, Sylvester flow, Hyperspherical VAE);

* enriching the prior (VampPrior);

* enriching the decoder (an ongoing work with Rianne van den Berg & Christos Louizos);

* learning fair representations (Hierarchical VampPrior VFAE);

* learning disentangled representation (ongoing work with Maximilian Ilse).

– Development of deep neural networks by:

* learning from large images, i.e., ~10,000×10,000 pixels, (Deep MIL, and an ongoing work with Nathan Ing, Arkadiusz Gertych, Beatrice Knudsen);

* learning from multiple sources, e.g., different views (an ongoing work with Henk van Voorst).

During the talk I will outline assumptions of the DeeBMED project and its successes. At the end, a possible direction for future work will be presented.

**Acknowledgments**:

During the project I have a great pleasure to publish with the following people (in alphabetical order):

* the University of Amsterdam: Rianne van den Berg, Philip Botros, Nicola de Cao, Tim Davidson, Luca Falorsi, Shi Hu, Maximilian Ilse, Thomas Kipf, Max Welling;

* the University of Oxford: Leonard Hasenclever;

* the Academic Medical Center in Amsterdam: Onno de Boer, Sybren Meijer;

* the Cedars-Sinai Medical Center in Los Angeles: Arkadiusz Gertych, Nathan Ing, Beatrice Knudsen.

Last but not least, all former and current members of AMLAB, QUVA Lab, Delta Lab and Philips Lab made my project successful through multiple discussions, meetings and seminars.

]]>**Abstract**: Control of multidimensional systems typically relies on accurately engineered models. Breaking this requirement is problematic with neural networks, as their Gaussian data assumptions typically do not hold. In my talk I will demonstrate how this problem can be efficiently solved by combining latent variable models with specific type of optimal control. The theory is demonstrated on various simulated closed-loop control systems as well as on real hardware.

**Abstract:** Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization, critical in deep learning. Typically, BO relies on conventional Gaussian process regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, Gaussian process-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. After a brief intro to BO and an overview of several use cases at Amazon, I will discuss a multi-task adaptive Bayesian linear regression model, whose computational complexity is attractive (linear) in the number of function evaluations and able to leverage information of related black-box functions through a shared deep neural net. Experimental results show that the neural net learns a representation suitable for warm-starting related BO runs and that they can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster than competing neural net-based methods recently published in the literature.

This is joint work with Valerio Perrone, Rodolphe Jenatton, and Matthias Seeger. It will be presented at NIPS 2018.

**Bio:** Cedric is the science lead of Amazon Core AI, with teams in Berlin, Barcelona, Tuebingen, and Seattle. His work on democratizing machine learning enables teams at Amazon deliver a wide range of machine learning-based products, including customer facing services part of Amazon SageMaker (aws.amazon.com/sagemaker). Currently, he is interested in algorithms that learn representations, algorithms that learn to learn, and algorithms that avoid catastrophic forgetting (in deep learning). Prior to joining Amazon, he led the Machine Learning group at Xerox Research Centre Europe (now Naver Labs Europe). His team conducted applied research in machine learning, computational statistics and mechanism design, with applications in customer care, transportation and governmental services. He joined Amazon, Berlin, as an Applied Science Manager in October 2013, where he was in charge of delivering zero-parameter machine learning algorithms.

You are all cordially invited to the AMLab seminar on **Thursday September 6** at 16:00 in C3.163 (FNWI, Amsterdam Science Park), where **Joris Mooij** will give a talk titled “**Validating Causal Discovery Methods**”. Afterwards there are the usual drinks and snacks.

**Abstract:**

Since the pioneering work by Peirce and Fisher, the gold standard for causal discovery is a randomized experiment. An intriguing alternative approach to causal discovery was proposed in the nineties, based on conditional independence patterns in the data. Over the past decades, dozens of causal discovery methods based on that idea have been proposed. These methods clearly work on simulated data when all their assumptions are satisfied. However, demonstrating their usefulness on real data has been a challenge. In this talk, I will discuss some of our recent attempts at validating causal discovery methods on large-scale interventional data sets from molecular biology. I will discuss a micro-array gene expression data set and a mass cytometry data set that seem perfectly suited for validation of causal discovery methods at first sight. As it turns out, however, both causal discovery on these data and the validation of such methods is more challenging than one might think initially. We find that even sophisticated modern causal discovery algorithms are outperformed by simple (non-causal) baselines on these data sets.

(joint work with Philip Versteeg and Tineke Blom)

]]>**Abstract:**

As machine learning models grow in size and complexity, and as applications reach critical social, economic and public health domains, learning interpretable data representations is becoming ever more important. Most of the current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. In our recent ICML-2018 paper, we proposed two rather contrasting interpretability frameworks. The first aims at controlling the accuracy vs. interpretability tradeoff by providing an interpretable lens for an existing model (which has already been optimized for accuracy). We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. Applying a flexible and invertible transformation to the input leads to an interpretable representation with no loss in accuracy. We extend the approach using an active learning strategy to choose the most useful side information to obtain, allowing a human to guide what “interpretable” means. The second framework relies on joint optimization for a representation which is both maximally informative about the side information and maximally compressive about the non-interpretable data factors. This leads to a novel perspective on the relationship between compression and regularization. We also propose an interpretability evaluation metric based on our frameworks. Empirically, we achieve state-of-the-art results on three datasets using the two proposed algorithms.

**Short bio:**

Tameem Adel is currently a research fellow in the Machine Learning Group at University of Cambridge, advised by Prof. Zoubin Ghahramani. He was previously an AMLAB postdoctoral researcher advised by Prof. Max Welling. He has obtained his PhD from University of Waterloo, Ontario, Canada, advised by Prof. Ali Ghodsi. His main research interests are circulated around probabilistic graphical models, Bayesian learning and inference, medical (especially MRI based) applications of machine learning, interpretability of deep models, and domain adaptation.

**Abstract**: We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.