**Abstract:**

The computer-aided analysis of medical scans is a longstanding goal in the medical imaging field. Currently, deep learning has became a dominant methodology for supporting pathologists and radiologist. Deep learning algorithms have been successfully applied to digital pathology and radiology, nevertheless, there are still practical issues that prevent these tools to be widely used in practice. The main obstacles are low number of available cases and large size of images (a.k.a. the small n, large p problem in machine learning), and a very limited access to annotation at a pixel level that can lead to severe overfitting and large computational requirements. We propose to handle these issues by introducing a framework that processes a medical image as a collection of small patches using a single, shared neural network. The final diagnosis is provided by combining scores of individual patches using a permutation-invariant operator (combination). In machine learning community such approach is called the multi-instance learning (MIL).

During this presentation we will outline the definition of the MIL and propose a learnable permutation-invariant operator using the attention mechanism. We will provide our preliminary results on a toy problem and real-life histopathology data.

**Authors:**

Maximilian Ilse, Jakub Tomczak, Max Welling

**Abstract:**

Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data provider knows the target variable and (iii) entities are not linked across the data providers. Hence, to the challenge of private learning, we add the potentially negative consequences of mistakes in entity resolution.

Our contribution is twofold. First, we describe a three-party end-to-end solution in two phases — privacy-preserving entity resolution and federated logistic regression over messages encrypted with an additively homomorphic scheme — , secure against a honest-but-curious adversary. The system allows learning without either exposing data in the clear or sharing which entities the data providers have in common. Our implementation is as accurate as a naive non-private solution that brings all data in one place, and scales to problems with millions of entities with hundreds of features. Second, we provide a formal analysis of the impact of entity resolution on learning.

]]>**Abstract**: Neural networks on graphs have gained renewed interest in the machine learning community. Recent results have shown that end-to-end trainable neural network models that operate directly on graphs can challenge well-established classical approaches, such as kernel-based methods or methods that rely on graph embeddings (e.g. DeepWalk). In this talk, I will motivate such an approach from an analogy to traditional convolutional neural networks and introduce our recent variant of graph convolutional networks (GCNs) that achieves promising results on a number of semi-supervised node classification tasks. If time permits, I will further introduce two extensions of this basic framework, namely: graph auto-encoders and relational GCNs. While graph auto-encoders provide a novel way of approaching problems like link prediction or recommendation, relational GCNs allow for efficient modeling of directed relational graphs, such as knowledge bases.

**Abstract**: In this talk I want to give a summary over thoughts and experiments we performed over the last couple of weeks in trying to develop a distributed Variational Inference algorithm. Although, theoretically, we can see advantages to the proposed model, as well as cannot immediately see theoretical reasons why it should not work, the experiments demonstrate that learning in the proposed algorithm is unstable and fails catastrophically in the tested settings. I would like to show our intuition and would be glad to discuss and collect your ideas.

**Abstract**: In sequential decision making under uncertainty, an agent attempts to find some function that maps from states to actions, such that a reward signal is maximized, taking both immediate and future reward into account. Under the graph-based perspective, we view the problem of optimal sequential decision making as doing inference in a graphical model.

In this talk I will present some of the research related to this perspective and connect it to recent work in Deep Learning such as Value Iteration Networks and Graph Convolutional Networks.

**Abstract**: Joint Causal Inference (JCI) is a recently proposed causal discovery framework that aims to discover causal relations based on multiple observational and experimental datasets, also in the presence of latent variables. Compared with current methods for causal inference, JCI allows to jointly learn both the causal structure and intervention targets by pooling data from different experimental conditions in a systematic way. This systematic pooling also improves the statistical power of the independence tests used to recover the causal relations, while the introduction of context variables can improve the identifiability of causal relations. In this talk I will introduce JCI and show two possible implementations using three recent causal discovery methods from literature, Ancestral Causal Inference [Magliacane et al. 2016], [Hyttinen et al. 2014] and Greedy Fast Causal Inference [Ogarrio et al. 2016]. Moreover, I will show the benefits of JCI in an evaluation on synthetic data and in an application to the flow cytometry dataset from [Sachs et al. 2005].

**Abstract:** The world that our brains experience is quite different from the world that most of our ML models experience. Most models in machine learning are now trained by randomly sampling data from some training set, updating the model, then repeating. When temporal data is considered, it is usually split into short sequences, where each sequence is considered to be a sample from some underlying distribution of sequences, which we wish to learn. Humans on the other hand, learn online – we receive a single, never-ending sequence of inputs. Moreover, these inputs come in asynchronously, and rather than representing the state of the world at a given time, represent that some aspect of the state of the world has changed.

In this talk, I’ll discuss some work we are doing close this gap, and allow us to apply the methods used in deep learning to the more natural online-learning setting.

]]>**Abstract:**

Deep Learning has shown considerable success in a wide range of domains due its rich parametric form and natural scalability to big datasets. Nevertheless, it has limitations that prevent its adoption in specific problems. It has been shown in recent works that they suffer from over-parametrization as they can be significantly pruned without any loss in performance. This fact essentially shows that there is a lot of wasteful computation and resources, which can lead to large speedups if it is avoided. Furthermore, current neural networks suffer from unreliable uncertainty estimates that prevent their usage in domains that involve critical decision making and safety.

In this talk we will show how these two relatively distinct problems can be addressed under a common framework that involves Bayesian inference. In particular, we will show that by adopting a more elaborate version of Gaussian dropout we can obtain deep learning models that can have robust uncertainty on a variety of tasks and architectures, while simultaneously providing compressed networks where most of the parameters and computation has been removed.

]]>**Abstract**: Structural causal models (SCMs), also known as non-parametric structural equation models (NP-SEMs), are widely used for causal modeling purposes. This talk consists of two parts: part one gives a rigorous treatment of structural causal models, dealing with measure-theoretic complications that arise in the presence of feedback, and part two deals with the marginalizion of SCMs. In part one we deal with recursive models (those without feedback), models where the solutions to the structural equations are unique, and arbitrary non-recursive models, those where the solutions are non-existent or non-unique. We show how we can reason about causality in these models and show how this differs from the recursive causal perspective. In part two, we address the question how we can marginalize an SCM (possibly with feedback), consisting of endogenous and exogenous variables, to a subset of the endogenous variables? Marginalizing an SCM projects the SCM down to an SCM on a subset of the endogenous variables, leading to a more parsimonious but causally equivalent representation of the SCM. We give an abstract defintion of marginalization and propose two approaches how to marginalize SCMs in a constructive way. Those constructive approaches define both a marginalization operation that effectively removes a subset of the endogenous variables from the model and lead to an SCM that has the same causal semantics as the original SCM. We provide several conditions under which the existence of such marginalizations hold.

**Abstract:** The elegance and simplicity of Bayesian Networks, i.e. probabilistic graphical models for directed acyclic graphs (DAGs), is rooted in the equivalence of several Markov properties like: the recursive factorization property (rFP) which allows for sparse parametrization, the directed global Markov (dGMP) property encoding all conditional independences or the structural equation property (SEP) which expresses the variables in functional relations.

But as soon as we allow the graphical structure to have feedback loops and/or latent confounders the mentioned equivalences break down. In this talk we will introduce a new graphical structure which allows to represent both latent confounding and feedback loops at once, show how to generalize the most important Markov properties to this case and demonstrate how these Markov properties are logically related to each other. Furthermore, we will indicate how this new layer of theory might be used for causal discovery algorithms in the presence of latent confounders, non-linear functional relations and feedback loops.