This will be the last seminar before the summer. We will start again in September.

**Abstract**: Old-school trading used to be a business with very limited use of statistics. Due to increasing automation and continuous technological advancement in infrastructure, statistics have now found their way into trading. In this presentation we will discuss how we as Flow Traders use machine learning and imagine its use in in the future. We will show you examples of how machine learning methods like neural networks and algorithms like gradient descent can help us capture the information content of financial markets.

]]>

**Abstract**:

A long-standing goal in AI has been to mimic the natural ability of human beings to infer things about sensory inputs and unforeseen data, usually involving a combination of logical and probabilistic reasoning. The last 10 years of research in statistical relational models have demonstrated how one can successfully borrow syntactic devices from first-order logic to define large graphical models over complex interacting random variables, classes, hierarchies, dependencies and constraints. Statistical relational models continue to be widely used for learning in large-scale knowledge bases, probabilistic configurations, natural language processing, question answering, probabilistic programming and automated planning.

While this progress has been significant, there are some fundamental limitations in the expressivity of these models. Statistical relational models make the finite domain assumption: given a clause such as “friends of smokers are smokers themselves”, the set of friends and those who smoke is assumed to be finite and known. It then makes it difficult to talk about unknown atoms and values (e.g., “All of John’s friends are worth more than a million”), categorical assumptions (e.g., “every animal eats”) and identity uncertainty (“James’ partner wore a red shawl”). Currently, approaches often simply ignore this issue, or deal with it in ad hoc ways.

In this work, we attempt to study this systematically. We begin with first-order probabilistic relational models. But now, we allow quantifiers to range over infinite sets, and although that makes matters undecidable in general, we show when limited to certain classes of statements, probabilistic reasoning becomes computable with attractive properties (e.g., satisfies the additive and equivalence axioms of probability in a first-order setting).

Parts of this work appeared at AAAI-17.

**Biography**:

Vaishak Belle is a Chancellor’s Fellow/Lecturer at the School of Informatics, University of Edinburgh, UK. Vaishak’s research is in artificial intelligence, specifically on the theme of unifying logic and probability in different guises. Previously, he was at KU Leuven, the University of Toronto, and the Aachen University of Technology. He has co-authored several articles in AI-related venues, and won the Microsoft best paper award at UAI, the Machine learning journal best student paper award at ECML-PKDD, and the Kurt Goedel silver medal.

**Abstract**: Integrating multiple sources of molecular measurements (such as RNA, micro RNA, and methylation data) across pan-cancer cohorts is a promising approach to learn general, non-cohort specific, disease profiles. These profiles provide rich representations of patients that can be used to learn novel subtypes and biomarkers, and are useful for survival prognoses and potentially drug-discovery. However, combining cohorts is challenging in part because the main signal in data is tissue-specific. Special care has to be made to avoid simply “learning the tissue”. In this talk I will describe an approach based on the variational auto-encoder, popular in the deep learning community, to learn an unsupervised latent representation of patients (the disease profile) that explicitly removes tissue/cohort information. Preliminary results indicate that the disease profiles carry little information about tissues and by doing so improves the profiles’ usefulness on other validation tasks, such as predicting cohort-specific survival and DNA mutations.

**Abstract**: Besides translational invariances, a broad class of images like medical or astronomical data exhibits rotational invariances. While such a priori knowledge was typically exploited by data augmentation, recent research shifts focus to directly implementing rotational equivariance into model architectures. I will present Steerable Filter CNNs which efficiently incorporate rotation equivariance by learning steerable filters. Two approaches, based on orientation-pooling or group-convolutions, are presented and discussed. A common weight initialization scheme is generalized to networks which learn filter banks as a linear combination of a fixed system of atomic filters.

**Abstract**: The gold standard to discover causal relations relies on experimentation. Over the last decades, an intriguing alternative has been proposed: constraint-based causal discovery methods can sometimes infer causal relations from certain statistical patterns in purely observational data. Even though this works nicely on paper, in practice the conclusions of such methods are often unreliable. We introduce Joint Causal Inference (JCI), a novel constraint-based method for causal discovery from multiple data sets that elegantly unifies both approaches. JCI aims to combine the best of two worlds: the reliability offered by experimentation, and the flexibility of not having to perform all theoretically possible experiments. We apply JCI in a causal transfer learning problem and use it to predict how a target variable is distributed (given observations of other variables) in new experiments. We illustrate this with examples where JCI makes the correct predictions, whereas standard feature selection methods make arbitrarily large prediction errors.

**Abstract:**

In online convex optimization it is well known that certain subclasses of objective functions are much easier than arbitrary convex functions. We are interested in designing adaptive methods that can automatically get fast rates in as many such subclasses as possible, without any manual tuning. Previous adaptive methods are able to interpolate between strongly convex and general convex functions. We present a new method, MetaGrad, that adapts to a much broader class of functions, including exp-concave and strongly convex functions, but also various types of stochastic and non-stochastic functions without any curvature. For instance, MetaGrad can achieve logarithmic regret on the unregularized hinge loss, even though it has no curvature, if the data come from a favourable probability distribution. MetaGrad’s main feature is that it simultaneously considers multiple learning rates. Unlike all previous methods with provable regret guarantees, however, its learning rates are not monotonically decreasing over time and are not tuned based on a theoretically derived bound on the regret. Instead, they are weighted directly proportional to their empirical performance on the data using a tilted exponential weights master algorithm.

**References:
**T. van Erven and W.M. Koolen. MetaGrad: Multiple Learning Rates in Online Learning. NIPS 2016.

W.M.Koolen, P. Grünwald and T. van Erven. Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning. NIPS 2016.

**Abstract**: Segmenting tree structures is common in several image processing applications. In medical image analysis, reliable segmentations of airways, vessels, neurons and other tree structures can enable important clinical applications. We present a method for extracting tree structures comprising of elongated branches by performing linear Bayesian smoothing in a probabilistic state-space. We apply this method to segment airway trees, wherein, airway states are estimated using the RTS (Rauch-Tung-Striebel) smoother, starting from several automatically detected seed points from across the volume. The RTS smoother tracks airways from seed points, providing Gaussian density approximations of the state estimates. We use covariance of the marginal smoothed density for each airway branch to discriminate true and false positives. Preliminary evaluation shows that the presented method results in additional branches compared to base-line methods.

**Abstract**:

Scaling up visual category recognition to large numbers of classes remains challenging. A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes. Ultimately, this may allow to use textbook knowledge that humans employ to learn about new classes by transferring knowledge from classes they know well. We tackle the zero-shot learning problem by learning a compatibility function such that matching image-class embedding pairs are assigned a higher score than mismatching pairs; zero-shot classification proceeds by finding the label vector yielding the highest joint compatibility score. We propose and compare different class embeddings learned automatically from unlabeled text corpora and from expert annotated attributes. Attribute annotations performed by humans are not readily available for most classes. On the other hand, humans have a natural ability to determine distinguishing properties of unknown objects. We use detailed visual descriptions collected from naive users as side-information for zero-shot learning, to generate images from scratch and to generate visual explanations which justify a classification decision.

**References**:

Akata et.al, Label Embeddings for Image Classification, TPAMI 2016

Xian et.al, Latent Embeddings for Zero-Shot Classification, CVPR 2016

Akata et.al, Multi-Cue Zero-Shot Learning with Strong Supervision, CVPR 2016

Xian et.al., Zero-Shot Learning: The Good, the Bad and the Ugly, CVPR’17

Karessli et.al., Gaze Embeddings for Zero-Shot Learning, CVPR’17

Reed et.al, Learning Deep Representations of Fine-Grained Visual Descriptions, CVPR 2016

Reed et.al, Generative Adversarial Text to Image Synthesis, ICML 2016

Reed et.al, Learning What and Where to Draw, NIPS 2016

Hendricks, Akata et.al, Generating Visual Explanations, ECCV 2016

**Abstract**: Causal questions abound across the empirical sciences, including basic biology, epidemiology, psychology and economics. Molecular biology is a particularly interesting area due to the ability in that field to perform interventions via diverse experimental techniques, such as measurements of gene expression levels under single gene knockouts. Such experiments are a key tool for dissecting causal regulatory relationships and provide an opportunity to validate causal discovery methods using real experimental data. In this talk, we provide results of an empirical assessment of several causal discovery algorithms using large-scale data from knockout experiments. We discuss several measures of performance defined using held-out observational and interventional data and find that while discovering system-wide causal structure remains difficult, especially when using only observational data, predicting the set of strongest causal effects is more feasible. We report that predictions of the strongest total causal effects based on a combination of interventional and observational data can be stable across performance measures and consistently outperform non-causal baselines.