**Abstract**: Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. In this talk, I will present my past and current work on Zero-Shot Learning, Vision and Language for Generative Modeling and Explainable Artificial Intelligence in that (1) how we can generalize the image classification models to the cases with no visual training data available, (2) how to generate images and image features using detailed visual descriptions, and (3) how our models focus on discriminating properties of the visible object, jointly predict a correct and an incorrect class label, and explain why the predicted correct label is appropriate for the image and why the predicted incorrect label is not appropriate for the image.

**Abstract**: Structural causal models (SCMs) are a popular tool to describe causal relations in systems in many fields such as economy, the social sciences, and biology. Complex (cyclical) dynamical systems, such as chemical reaction networks, are often described by a set of ODEs. We show that SCMs are not flexible enough in general to give a complete causal representation of equilibrium states in these dynamical systems. Since such systems do form an important modeling class for real-world data, we extend the concept of an SCM to a generalized structural causal model. We show that this allows us to capture the essential causal semantics that characterize dynamical systems. We illustrate our approach on a basic enzymatic reaction.

On **Monday April 9** at 16:00 in room **C1.112**, **Avital Oliver** (Google Brain) will give a talk titled “**Realistic Evaluation of Semi-Supervised Learning Algorithms**“;

On **Tuesday April 10** at 16:00 in room **F1.02**, **Petar Veličković** (University of Cambridge) will give a talk titled “**Keeping our graphs attentive**“.

Abstracts and bio’s are included below. Afterwards there will be the usual drinks and snacks. (Note that room F1.02 for Petar’s talk is a several minute walk away from the main entrance.)

**Avital Oliver: Realistic Evaluation of Semi-Supervised Learning Algorithms**

**Abstract**: Semi-supervised learning (SSL) leverages unlabeled data when labels are limited or expensive to obtain. Approaches based on neural networks have recently proven successful on standard benchmark tasks. In this talk, I will argue that these benchmarks fail to simulate many aspects of real-world applicability.

In order to better test these approaches, I will present a suite of experiments designed to address these issues. These experiments find that simple baselines which do not use unlabeled data can be competitive with the state-of-the-art, that SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and that performance can degrade substantially when the unlabeled dataset contains out-of-class examples.

(Joint work with Augustus Odena, Colin Raffel, Ekin Dogus Cubuk and Ian Goodfellow)

**Bio**: Avital Oliver is a Google Brain Resident, currently working on semi-supervised learning. His research interests are in data efficient learning, clustering with neural networks, neural network loss landscape, and applications to education. He previously interned at OpenAI, and graduated summa cum laude with an M.Sc. degree in Mathematics from Bar-Ilan University, where he did research in group theory.

Petar Veličković: Keeping our graphs attentive

**Abstract**: A multitude of important real-world datasets (especially in biology) come together with some form of graph structure: social networks, citation networks, protein-protein interactions, brain connectome data, etc. Extending neural networks to be able to properly deal with this kind of data is therefore a very important direction for machine learning research, but one that has received comparatively rather low levels of attention until very recently.

Attentional mechanisms represent a very promising direction for extending the established convolutional operator on images to work on arbitrary graphs, as they satisfy many of the desirable features for a convolutional operator. Through this talk, I will focus on my work on Graph Attention Networks (GATs), where these theoretical properties have been further validated by solid results on transductive as well as inductive node classification benchmarks. I will also outline some of the earlier efforts towards deploying attention-style operators on graph structures, as well as very exciting recent work that expands on GATs and deploys them in more general circumstances (such as EAGCN, DeepInf, and applications to solving the Travelling Salesman Problem). Time permitting, I will also present some of the relevant related graph-based work on computational biology, currently ongoing in my research group in Cambridge.

Finally, I will present the aims of my ongoing collaboration with Thomas Kipf, centered towards leveraging the intermediate information computed by a GAT layer as a proxy for more challenging tasks, such as graph classification.

**Bio**: Petar Veličković is currently a final-year PhD student in Machine Learning and Bioinformatics at the Department of Computer Science and Technology of the University of Cambridge. He also holds a BA degree in Computer Science from Cambridge, having completed the Computer Science Tripos in 2015. In addition, he has been involved in research placements at Nokia Bell Labs (working with Nicholas Lane) and the Montréal Institute of Learning Algorithms (working with Adriana Romero and Yoshua Bengio). His current research interests broadly involve devising neural network architectures that operate on nontrivially structured data (such as graphs), and their applications in bioinformatics and medicine. He has published his work in these areas at both machine learning venues (ICLR, NIPS ML4H) and biomedical venues and journals (Bioinformatics, PervasiveHealth).

**Abstract**: Reconstructing three dimensional structures from noisy two dimensional orthographic projections is a central task in many scientific domains, examples range from medical tomography to single particle electron microscopy.

We treat this problem from a Bayesian point of view. Specifically, we regard a specimen’s structure and its pose as latent factors which are marginalized over. This allows us to express uncertainty in pose and even local uncertainty in the sample’s structure. This information can serve to detect unstable sub-structures or multiple configurations of a specimen. In particular, we apply amortized deep neural networks to encode observations into latent factors. This bears the advantage of transferability across multiple structures. To this end, we propose to train the model alternately in observation space and latent space, resulting in a generalized version of the wake-sleep algorithm.

We focus our experiments on cryogenic electron microscopy (CryoEM) single particle analysis, a technique that enables deep understanding of structural biology and chemistry by inspecting single proteins. We show our model to be competitive while predicting reasonable uncertainties. Moreover, we empirically demonstrate that the model is more data efficient than competitive methods and that it is transferable between molecules.

**Abstract**: We propose a framework for solving combinatorial optimization problems of which the output can be represented as a sequence of input elements. As an alternative to the Pointer Network, we parameterize a policy by a model based entirely on (graph) attention layers, and train it efficiently using REINFORCE with a simple and robust baseline based on a deterministic (greedy) rollout of the best policy found during training. We significantly improve over state-of-the-art results for learning algorithms for the 2D Euclidean TSP, reducing the optimality gap for a single tour construction by more than 75% (to 0.33%) and 50% (to 2.28%) for instances with 20 and 50 nodes respectively.

**Abstract**: In quantum computation one of the key challenges is to build fault-tolerant logical qubits. A logical qubit consists of several physical qubits. In stabilizer codes, a popular class of quantum error correction schemes, a part of the system of physical qubits is measured repeatedly, without measuring (and collapsing by the Born rule) the state of the encoded logical qubit. These repetitive measurements are called syndrome measurements, and must be interpreted by a classical decoder in order to determine what errors occurred on the underlying physical system. The decoding of these space- and time-correlated syndromes is a highly non-trivial task, and efficient decoding algorithms are known only for a few stabilizer codes. In this talk I will explain how we design and train decoders based on recurrent neural networks.

**Abstract**: Deep learning has been very successful in many applications, but there are a number challenges that still need to be addressed:

1) DL does not provide reliable confidence intervals

2) DL is susceptible to small adversarial input perturbations

3) DL easily overfits

4) DL uses too much energy and memory

In this talk I will argue that we should be looking at stochastic DL models where the hidden units are noisy. We can train these models with variational methods.

A number of interesting connections emerge in such models:

1) The noisy hidden units form an information bottleneck

2) Through local reparameterization we can interpret these models as Bayesian

3) The noise can be used to create privacy preserving models

4) Stochastic quantization to low bit-width can make DL more power and memory efficient.

This talk will not go in great depth in these topics but rather paint the larger picture.

**Abstract**: Will administering a certain chemical cause a cancer cell to stop multiplying? To answer this and other scientific “what-if” questions, we need causal models, which describe the cause-effect relations within a system of interest. Because even domain experts may not know the right causal model, we want to learn it automatically from large-scale data. This problem is called causal discovery, and is very difficult: the signals in the data that allow us to distinguish different causal models are often weak, so we need to be careful when interpreting them. Also, the number of candidate models that must be considered makes this problem computationally challenging. I will present some of my recent results which are an important step towards developing a statistically accurate and computationally efficient algorithm for causal discovery.

**Abstract**: The successful uptake of deep neural networks in high-risk domains is contingent on the capability to ensure minimal-risk guarantees. This requires that deep neural networks provide predictive uncertainty of high quality. Amortized variational inference provides a promising direction to achieve this, but demands a flexible yet tractable approximative posterior, which is an open area of research. We propose “SQUAVI”, a novel and flexible variational inference model that imposes a multinomial distribution on quantized latent variables. The proposed method is scalable, self-normalizing and sample efficient, and we demonstrate that the model utilizes the flexible posterior to its full potential, learns interesting non-linearities, and provides predictive uncertainty of competitive quality.

**Abstract**: A major challenge in Bayesian Optimization is the boundary issue (Swersky, 2017) where an algorithm spends too many evaluations near the boundary of its search space. In this paper we propose BOCK, Bayesian Optimization with Cylindrical Kernels, whose basic idea is to transform the ball geometry of the search space using a cylindrical transformation. Because of the transformed geometry, the Gaussian Process-based surrogate model spends less budget searching near the boundary, while concentrating its efforts relatively more near the center of the search region, where we expect the solution to be located. We evaluate BOCK extensively, showing that it is not only more accurate and efficient, but it also scales successfully to problems with a dimensionality as high as 500. We show that the better accuracy and scalability of BOCK even allows optimizing modestly sized neural network layers, as well as neural network hyperparameters.