Talk by Sara Magliacane

You are all cordially invited to the AMLab seminar on Tuesday November 29 at 16:00 in C3.163, where Sara Magliacane will give a talk titled “Ancestral Causal Inference”. Afterwards there are the usual drinks and snacks!

Abstract: This is a practice talk for a ~12 minutes general-audience talk at a NIPS workshop, so ideally it should require no previous knowledge on causality.

Discovering causal relations from data is at the foundation of the scientific method. Traditionally, cause-effect relations have been recovered from experimental data in which the variable of interest is perturbed, but seminal work like the do-calculus and the PC/FCI algorithms demonstrate that, under certain assumptions, it is already possible to obtain significant causal information by using only observational data.

Recently, there have been several proposals for combining observational and experimental data to discover causal relations. These causal discovery methods are usually divided into two categories: constraint-based and score-based methods. Score-based methods typically evaluate models using a penalized likelihood score, while constraint-based methods use statistical independences to express constraints over possible causal models. The advantages of constraint-based over score-based methods are the ability to handle latent confounders naturally, no need for parametric modeling assumptions and an easy integration of complex background knowledge, especially in the logic-based methods.

We propose Ancestral Causal Inference (ACI), a logic-based method that provides a comparable accuracy to the best state-of-the-art constraint-based methods, but improves on their scalability by using a more coarse-grained representation of causal information. Furthermore, we propose a method to score predictions according to their confidence. We provide some theoretical guarantees for ACI, like soundness and asymptotic consistency, and demonstrate that it can outperform the state-of-the-art on synthetic data, achieving a speedup of several orders of magnitude. We illustrate its practical feasibility by applying it on a challenging protein data set that so far had only been addressed with score-based methods.