**Abstract:** A graphical model is a structured representation of the data generating process. The traditional method to reason over random variables is to perform inference in this graphical model. However, in many cases the generating process is only a poor approximation of the much more complex true data generation process, leading to poor posterior estimates. The subtleties of the generative process are however captured in the data itself and we can “learn to infer”, that is, learn a direct mapping from observations to explanatory latent variables. In this work we propose a hybrid model that combines graphical inference with a learned inverse model, which we structure as a graph neural network. The iterative algorithm is formulated as a recurrent neural network. By using cross-validation we can automatically balance the amount of work performed by graphical inference versus learned inference. We apply our ideas to the Kalman filter, a Gaussian hidden Markov model for time sequences. We apply our “Graphical Recurrent Inference” method to a number of path estimation tasks and show that it successfully outperforms either learned or graphical inference run in isolation.

]]>

**Abstract:** Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet.

]]>

**Abstract:** In this seminar, I’ll present a project I’ve been working on with Sandeep Manjanna and Gregory Dudek (Mobile Robotics Lab, McGill University). In this project, we investigated selective coverage strategies for a robot tasked with surveying or searching prioritised locations in a given area. This problem can be modelled as a Markov decision process and solved with reinforcement learning strategies, but the state space is extremely large, requiring these states to be aggregated. The proposed state aggregation method is shown to generalize well between different environments. In field tests over reefs at the Folkestone Marine Reserve, using this method an autonomous surface vehicle was able to improve the number of useable visual data samples.

**Abstract:** The so-called causal Markov and causal faithfulness assumptions are well-established pillars behind causal discovery from observational data. The first is closely related to the memorylessness property of dynamical systems, and allows us to predict observable conditional independencies in the data from the underlying causal model. The second is the causal equivalent of Ockham’s razor, and enables us to reason backwards from data to the causal model of interest.

Though theoretically reasonable, in practice with limited data from real-world systems we often encounter violations of faithfulness. Some of these, like weak long-distance interactions, are handled surprisingly well by benchmark constraint-based algorithms such as FCI. Other violations may imply inconsistencies between observed (conditional) independence statements in the data that cannot currently be handled both effectively and efficiently by most constraint based algorithms. A fundamental question is whether our output retains any validity when not all our assumptions are satisfied, or whether it is still possible to reliably rescue parts of the model.

In this talk we introduce a novel approach based on a relaxed form of the faithfulness assumption that is able to handle many of the detectable faithfulness violations efficiently while ensuring the output causal model remains valid. Essentially we obtain a principled and efficient form of error-correction on observed in/dependencies, that can significantly improve both accuracy and reliability of the output causal models in practice. True; it cannot handle all possible violations, but the relaxed faithfulness assumption may be a promising step towards a more realistic, and so more effective, underpinning of the challenging task of causal discovery from real-world systems.

**Abstract**: Group convolutional neural networks (GCNN) are symmetric under predefined, invertible transformations in the input e.g. rotations, flips, and translations. Can we extend this framework in the absence of invertibility, for instance in the case of pixelated image downscalings, or causal time-shifting of audio signals? To this end, I present Semigroup Convolutional Neural Networks (SCNN), a generalisation of GCNNs based on the related theory of semigroups. I will showcase a specialisation of a scale-equivariant SCNN, where the activations of each layer of the network live on a classical scale-space, finally linking the classical field of scale-spaces and modern deep learning.

**Abstract**: We present a convolutional network that is equivariant to rigid body motions. The model uses scalar-, vector-, and tensor fields over 3D Euclidean space to represent data, and equivariant convolutions to map between such representations. These SE(3)-equivariant convolutions utilize kernels which are parameterized as a linear combination of a complete steerable kernel basis, which is derived analytically in this paper. We prove that equivariant convolutions are the most general equivariant linear maps between fields over R^3. Our experimental results confirm the effectiveness of 3D Steerable CNNs for the problem of amino acid propensity prediction and protein structure classification, both of which have inherent SE(3) symmetry.

**Abstract**: Relational data is, roughly speaking, any form of data that can be represented as a graph: A social network, user preference data, protein-protein interactions, etc. A recent body of work, by myself and others, aims to develop a statistical theory of such data for problems where a single graph is observed (such as a small part of a large social network). Keywords include graphon, edge-exchangeable and sparse exchangeable graphs, and many latent variable models used in machine learning. I will summarize the main ideas and results of this theory: How and why the exchangeability assumptions implicit in commonly used models for such data may fail; what can be done about it; what we know about convergence; and implications of these results for methods popular in machine learning, such as graph embeddings and empirical risk minimization.

**Bio**: Peter Orbanz is associate professor of statistics at Columbia University. His research interests include network and relational data, Bayesian nonparametrics, symmetry principles in machine learning and statistics, and hierarchies of latent variables. He was an undergraduate student at the University of Bonn, a PhD student at ETH Zurich, and a postdoctoral fellow at the University of Cambridge.

**Abstract**: Understanding the functionalities of high-level features from deep neural networks (DNNs) is a long standing challenge. Towards achieving this ultimate goal, we propose a channel-recurrent architecture in place of the vanilla fully-connected layers to construct more interpretable and expressive latent spaces. Building on Variational Autoencoders (VAEs), we integrate recurrent connections across channels to both inference and generation steps, allowing the high-level features to be captured in global-to-local, coarse-to-fine manners. Combined with adversarial loss as well as two novel regularizations–namely the KL objective weighting scheme over time steps and mutual information maximization between transformed latent variables and the outputs, our channel-recurrent VAE-GAN (crVAE-GAN) outperforms VAE-GAN in generating a diverse spectrum of high resolution images while maintaining the same level of computational efficacy. Moreover, when applying crVAE-GAN in an attribute-conditioned generative setup, we further augment an attention mechanism over each attribute to indicate the specific latent subset responsible for its modulation, further imposing semantic meanings to the latent spaces. Evaluations are through both qualitative visual examination and quantitative metrics.

**Abstract**: The complexity of functions a neural network approximates make

it hard to explain what the classification decision is based on. In this

work, we present a framework that exposes more information about this

decision-making process. Instead of producing a classification in a

single step, our model iteratively makes binary sub-decisions which,

when combined as a whole, ultimately produce the same classification

result while revealing a decision tree as thought process. While there

is generally a trade-off between interpretability and accuracy, the

insights our model generates come at a negligible loss in accuracy. The

decision tree resulting from the sequence of binary decisions of our

model reveal a hierarchical clustering of the data and can be used as

learned attributes in zero-shot learning.

**Abstract**: Optimal Transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show how this principle dictates the minimization of the Wasserstein distance between the encoder aggregated posterior and the prior, plus a reconstruction error. We prove that in the non-parametric limit the autoencoder generates the data distribution if and only if the two distributions match exactly, and that the optimum can be obtained by deterministic autoencoders. We then introduce the Sinkhorn AutoEncoder (SAE), which casts the problem into Optimal Transport on the latent space. The resulting Wasserstein distance is minimized by backpropagating through the Sinkhorn algorithm. SAE models the aggregated posterior as an implicit distribution and therefore does not need a reparameterization trick for gradients estimation. Moreover, it requires virtually no adaptation to different prior distributions. We demonstrate its flexibility by considering models with hyperspherical and Dirichlet priors, as well as a simple case of probabilistic programming. SAE matches or outperforms other autoencoding models in visual quality and FID scores.

Joint work with Marcello Carioni (KFU Graz), Patrick Forré, Samarth Bhargav, Max Welling, Rianne van den Berg, Tim Genewein (Bosch Centre for AI), Frank Nielsen (Ecole Polytecnique)

]]>