Nicolai Meinshausen is Professor of Statistics at ETH Zurich. Before taking up his current post in 2013, he was Professor of Statistics at the University of Oxford and a post-doc at UC Berkeley. His IMS Medallion lecture, Causal discovery with confidence using invariance principles, will be delivered at the JSM in Seattle on Monday, August 10, 2015, at 2:00pm.

Causal discovery with confidence using invariance principles

I am not entirely sure why I received the great honour of a Medallion lecture, but I am rather sure it was neither for my rhetoric abilities nor for the work on the topic I want to speak about in Seattle. I will focus on the happier part of the two. By choosing causal inference as a topic for the lecture and for my work over the past year, I am aware that I am entering a crowded, challenging and fascinating field. What is interesting about causal inference from a practical point of view? I think most people would agree that one of the defining advantages of causal models is (or would be) that they work equally well in new environments and settings. We should get the same predictive accuracy with a causal model, no matter whether we just observe or actively intervene on the predictors. Causal models show in other words invariance across different environments. While this aspect is well known and established, I want to show a few examples where the approach can be reversed: instead of trying to get a causal model which will then be invariant in its predictive accuracy across different environments, we can use the invariance property to infer the causal model. Having data from different environments, we can look for all models that are in fact invariant in a suitable sense across the environments. The causal model has to be one of them. This provides a novel way to perform causal inference. Confidence intervals for the causal coefficients follow naturally.

For this approach to work, one needs on the one hand more than just observational data, but one does not need data from carefully designed randomised studies on the other hand. The data need to be observed for example under different and unknown interventions. Or the system is observed in different environments that change the noise distribution at each variable in an unknown way. While inhomogeneity of data (for example biological experiments performed in different labs) is often perceived as a stumbling block for analysis, this opens the possibility that the inhomogeneity is actually good, at least for causal analysis. The exact form of the invariance can take different forms and will depend on assumptions we are willing to make about presence or absence of hidden variables, feedback loops and the type of interventions. I will show a few examples along with necessary assumptions. Empirical results on biological experiments show the scope and limitations of the approach. This is joint work with Christina Heinze, Jonas Peters, Peter Buehlmann and Dominik Rothenhaeusler.