You are all cordially invited to the AMLab seminar on Tuesday September 11 at 15:00(!) in C0.110 (FNWI, Amsterdam Science Park), where Cédric Archambeau (Amazon Core AI) will give a talk titled “Learning Representations for Hyperparameter Transfer Learning”. Afterwards there are the usual drinks and snacks.
Abstract: Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization, critical in deep learning. Typically, BO relies on conventional Gaussian process regression, whose algorithmic complexity is cubic in the number of evaluations. As a result, Gaussian process-based BO cannot leverage large numbers of past function evaluations, for example, to warm-start related BO runs. After a brief intro to BO and an overview of several use cases at Amazon, I will discuss a multi-task adaptive Bayesian linear regression model, whose computational complexity is attractive (linear) in the number of function evaluations and able to leverage information of related black-box functions through a shared deep neural net. Experimental results show that the neural net learns a representation suitable for warm-starting related BO runs and that they can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster than competing neural net-based methods recently published in the literature.
This is joint work with Valerio Perrone, Rodolphe Jenatton, and Matthias Seeger. It will be presented at NIPS 2018.
Bio: Cedric is the science lead of Amazon Core AI, with teams in Berlin, Barcelona, Tuebingen, and Seattle. His work on democratizing machine learning enables teams at Amazon deliver a wide range of machine learning-based products, including customer facing services part of Amazon SageMaker (aws.amazon.com/sagemaker). Currently, he is interested in algorithms that learn representations, algorithms that learn to learn, and algorithms that avoid catastrophic forgetting (in deep learning). Prior to joining Amazon, he led the Machine Learning group at Xerox Research Centre Europe (now Naver Labs Europe). His team conducted applied research in machine learning, computational statistics and mechanism design, with applications in customer care, transportation and governmental services. He joined Amazon, Berlin, as an Applied Science Manager in October 2013, where he was in charge of delivering zero-parameter machine learning algorithms.
Now that the summer months are over, weekly AMLab seminars are starting up again. With a change: from now on, the talks will be held on Thursdays instead of Tuesdays.
You are all cordially invited to the AMLab seminar on Thursday September 6 at 16:00 in C3.163 (FNWI, Amsterdam Science Park), where Joris Mooij will give a talk titled “Validating Causal Discovery Methods”. Afterwards there are the usual drinks and snacks.
Since the pioneering work by Peirce and Fisher, the gold standard for causal discovery is a randomized experiment. An intriguing alternative approach to causal discovery was proposed in the nineties, based on conditional independence patterns in the data. Over the past decades, dozens of causal discovery methods based on that idea have been proposed. These methods clearly work on simulated data when all their assumptions are satisfied. However, demonstrating their usefulness on real data has been a challenge. In this talk, I will discuss some of our recent attempts at validating causal discovery methods on large-scale interventional data sets from molecular biology. I will discuss a micro-array gene expression data set and a mass cytometry data set that seem perfectly suited for validation of causal discovery methods at first sight. As it turns out, however, both causal discovery on these data and the validation of such methods is more challenging than one might think initially. We find that even sophisticated modern causal discovery algorithms are outperformed by simple (non-causal) baselines on these data sets.
(joint work with Philip Versteeg and Tineke Blom)
You are all cordially invited to the AMLab seminar on Tuesday August 28 at 16:00 in C3.163, where Tameem Adel will give a talk titled “On interpretable representations and the tradeoff between accuracy and interpretability”. Afterwards there are the usual drinks and snacks!
As machine learning models grow in size and complexity, and as applications reach critical social, economic and public health domains, learning interpretable data representations is becoming ever more important. Most of the current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. In our recent ICML-2018 paper, we proposed two rather contrasting interpretability frameworks. The first aims at controlling the accuracy vs. interpretability tradeoff by providing an interpretable lens for an existing model (which has already been optimized for accuracy). We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. Applying a flexible and invertible transformation to the input leads to an interpretable representation with no loss in accuracy. We extend the approach using an active learning strategy to choose the most useful side information to obtain, allowing a human to guide what “interpretable” means. The second framework relies on joint optimization for a representation which is both maximally informative about the side information and maximally compressive about the non-interpretable data factors. This leads to a novel perspective on the relationship between compression and regularization. We also propose an interpretability evaluation metric based on our frameworks. Empirically, we achieve state-of-the-art results on three datasets using the two proposed algorithms.
Tameem Adel is currently a research fellow in the Machine Learning Group at University of Cambridge, advised by Prof. Zoubin Ghahramani. He was previously an AMLAB postdoctoral researcher advised by Prof. Max Welling. He has obtained his PhD from University of Waterloo, Ontario, Canada, advised by Prof. Ali Ghodsi. His main research interests are circulated around probabilistic graphical models, Bayesian learning and inference, medical (especially MRI based) applications of machine learning, interpretability of deep models, and domain adaptation.
A few days after Dmitry Vetrov’s talk (Thursday morning), we’ll have another guest speaker: on Monday July 30 at 11:00 in C3.163, Jesse Bettencourt (University of Toronto) will give a talk titled “Neural Ordinary Differential Equations”.
Abstract: We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
Next week, Dmitry Vetrov (Higher School of Economics & Samsung AI center, Moscow) will be visiting us, and will give a talk titled “Interesting properties of the variational dropout framework”. You are all cordially invited to this talk on Thursday morning July 26, at 11:00 in C1.112 (FNWI, Amsterdam Science Park).
Abstract: Recently it was shown that dropout, popular regularization technique, can be treated as Bayesian procedure. Such Bayesian interpretation allows us to extend the initial model and to set the individual dropout rates for each weight of DNN. Variational inference automatically sets the rates to their optimal values that surprizingly leads to very high sparsification of DNN. The effect is similar in spirit to well-known ARD procedure for linear models and neural networks. By exploiting different extension one may show that DNNs can be trained with extremely large dropout rates and even when traditional signal-to-noise ratio is zero (e.g. when all weights in the layer have zero means and tunable variances). Coupled with recent discoveries about the landscape of loss these results provide new perspective in building much more powerful yet compact ensembles and/or removing the redundancy in modern deep learning models. In the talk we will cover these topics and present our most recent results in exploring those models.
Bio: Dmitry Vetrov (graduated from Moscow State Univerisity in 2003, PhD in 2006) is research professor at Higher School of Economics, Moscow and head of deep learning lab at Samsung AI center in Moscow. He is founder and head of Bayesian methods research group which became one of the strongest research groups in Russia. Three of his former PhD students became researchers in DeepMind. His research focuses on combining Bayesian framework with deep learning models. His group is also actively involved in building scalable tools for stochastic optimization, the application of tensor decomposition methods to large-scale ML, constructing cooperative multi-agent systems, etc.
UPDATE: This talk will be rescheduled to a new date after the summer.
You are all cordially invited to the AMLab seminar on Tuesday June 12 at 16:00 in C3.163, where Bela Mulder (AMOLF) will give a talk titled “Pitting man against machine in the arena of bottom-up design of crystal structures”. Afterwards there are the usual drinks and snacks!
Abstract: In this highly informal seminar I would like to pitch the question “Can a machine learning system develop a theory?” One of the much-touted properties of deep learning networks is that their deeper levels develop higher order generalization representations of their inputs. This begs the question whether they are able to hit upon the type of hidden structures in physical problem that are the cornerstone of effective physical theories. I would like to propose to test this idea in a concrete setting related to the highly relevant question of inverse design of self-assembling matter. I have recently formulated a novel approach towards inferring the specific short range isotropic interactions between particles of multiple types on lattices of given geometry in order that they spontaneously form specified periodic states of essentially arbitrary complexity. This approach rests upon the subtle intertwining between the group of transformations that leave the lattice structure invariant, with the group of permutations in the set of particle types induced by these same transformations on the target ordered structure. The upshot of this approach is that the number of independent coupling constants in the lattice can be systematically reduced from O(N2), where N is the number of distinct species, to O(N). The idea would be to see whether a machine learning approach which uses the space of possible patterns and their trivial transforms under symmetry operations as input, the set of possible constants as outputs, and feedback based on the degree to which the target structure is realized with these coupling constants is able to “learn” the symmetry-based rules, in a way that also generalizes to similar patterns not included in the training set.
You are all cordially invited to the AMLab seminar on Tuesday May 29 at 16:00 in C3.163, where Diederik Roijers (VU) will give a talk titled “Multiple objectives: because we (should) care about the user”. Afterwards there are the usual drinks and snacks!
Abstract: Multi-objective reinforcement learning is on the rise. In this talk, we discuss why multi-objective models and methods are a natural way to model real-world problems, can be highly beneficial, and can be essential if we want optimise for actual users. First, we discuss both the intuitive and formal motivation for multi-objective decision making. Then, we introduce the utility-based approach, in which we show we can make better decisions by putting user utility at the centre of our models and methods. And finally, we discuss two example methods for two different scenarios for using multi-objective models and methods, as well as open challenges.
You are all cordially invited to the AMLab seminar on Tuesday May 22 at 16:00 in C3.163, where Taco Cohen will give a talk titled “The Quite General Theory of Equivariant Convolutional Networks”. Afterwards there are the usual drinks and snacks!
Abstract: Group equivariant and steerable convolutional neural networks (regular and steerable G-CNNs) have recently emerged as a very effective model class for learning from signal data such as 2D and 3D images, video, and other data where symmetries are present. In geometrical terms, regular G-CNNs represent data in terms of scalar fields (“feature channels”), whereas the steerable G-CNN can also use vector and tensor fields (“capsules”) to represent data. In this paper we present a general mathematical framework for G-CNNs on homogeneous spaces like Euclidean space or the sphere. We show that the layers of an equivariant network are convolutional if and only if the input and output feature spaces transform like a field. This result establishes G-CNNs as a universal class of equivariant network architectures. Furthermore, we study the space of equivariant filter kernels (or propagators), and show how an understanding of this space can be used to construct G-CNNs for general fields over homogeneous spaces. Finally, we discuss several applications of the theory, such as 3D model recognition, molecular energy regression, analysis of protein structure, omnidirectional vision, and others.
The goal of this talk is to explain this new mathematical theory in a way that is accessible to the machine learning community.
You are all cordially invited to the AMLab seminar on Tuesday May 15 at 16:00 in C3.163, where Emiel Hoogeboom will give a talk titled “G-HexaConv”. Afterwards there are the usual drinks and snacks!
Abstract: The effectiveness of Convolutional Neural Networks stems in large part from their ability to exploit the translation invariance that is inherent in many learning problems. Recently, it was shown that CNNs can exploit other invariances, such as rotation invariance, by using group convolutions instead of planar convolutions. However, for reasons of performance and ease of implementation, it has been necessary to limit the group convolution to transformations that can be applied to the filters without interpolation. Thus, for images with square pixels, only integer translations, rotations by multiples of 90 degrees, and reflections are admissible.
Whereas the square tiling provides a 4-fold rotational symmetry, a hexagonal tiling of the plane has a 6-fold rotational symmetry. In this paper we show how one can efficiently implement planar convolution and group convolution over hexagonal lattices, by re-using existing highly optimized convolution routines. We find that, due to the reduced anisotropy of hexagonal filters, planar HexaConv provides better accuracy than planar convolution with square filters, given a fixed parameter budget. Furthermore, we find that the increased degree of symmetry of the hexagonal grid increases the effectiveness of group convolutions, by allowing for more parameter sharing. We show that our method significantly outperforms conventional CNNs on the AID aerial scene classification dataset, even outperforming ImageNet pre-trained models.
You are all cordially invited to the AMLab seminar on Tuesday April 24 at 16:00 in C3.163, where Zeynep Akata will give a talk titled “Representing and Explaining Novel Concepts with Minimal Supervision”. Afterwards there are the usual drinks and snacks!
Abstract: Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition are generally opaque and do not output any justification text; contemporary vision-language models can describe image content but fail to take into account class-discriminative image aspects which justify visual predictions. In this talk, I will present my past and current work on Zero-Shot Learning, Vision and Language for Generative Modeling and Explainable Artificial Intelligence in that (1) how we can generalize the image classification models to the cases with no visual training data available, (2) how to generate images and image features using detailed visual descriptions, and (3) how our models focus on discriminating properties of the visible object, jointly predict a correct and an incorrect class label, and explain why the predicted correct label is appropriate for the image and why the predicted incorrect label is not appropriate for the image.