Judea Pearl is a professor of computer science and statistics at UCLA. He is a graduate of the Technion, Israel, and joined the faculty of UCLA in 1970, where he currently directs the Cognitive Systems Laboratory and conducts research in artificial intelligence, causal inference and philosophy of science. Pearl has authored several hundred research papers and three books: Heuristics (1984), Probabilistic Reasoning (1988), and Causality (2000; 2009). He is a member of the National Academy of Engineering, the American Academy of Arts and Sciences, and a Fellow of the IEEE, AAAI and the Cognitive Science Society. Pearl received the 2008 Benjamin Franklin Medal for Computer and Cognitive Science and the 2011 David Rumelhart Prize from the Cognitive Science Society. In 2012, he received the Technion’s Harvey Prize and the ACM A.M. Turing Award for the development of a calculus for probabilistic and causal reasoning. His Medallion Lecture will be at JSM on Tuesday August 6 at 2pm.

The Mathematics of Causal Inference

Recent developments in graphical models and the logic of counterfactuals have had a marked effect on the way scientists treat problems involving cause–effect relationships. Paradoxes and controversies have been resolved, slippery concepts have been demystified, and practical problems requiring causal information, which long were regarded as either metaphysical or unmanageable, can now be solved using elementary mathematics.

I will review concepts, principles, and mathematical tools that were found useful in this transformation, and will demonstrate their applications in several data-intensive sciences. These include questions of confounding control, policy analysis, misspecification tests, mediation, heterogeneity, selection bias, missing data and the integration of data from diverse studies.

These advances owe their development to two methodological principles. First, a commitment to understanding what reality should be like for a statistical routine to succeed and, second, a commitment to express the understanding of reality in terms of data-generating models, rather than distributions of observed variables.

Data generation models, encoded as nonparametric structural equations, have led to a fruitful symbiosis between graphs and counterfactuals that has unified the potential outcome framework of Neyman, Rubin and Robins. with the econometric tradition of Haavelmo, Marschak and Heckman.

In this symbiosis, counterfactuals emerge as natural byproducts of structural equations and serve to formally articulate research questions of interest. Graphical models, on the other hand, are used to encode scientific assumptions in a qualitative (i.e., nonparametric) language, identify their testable implications, and determine the estimability of interventional and counterfactual research questions.

One of the major results along this development has been a complete solution to the problem of non-parametric causal effects identification. Given data from observational studies and qualitative assumptions of how variables relate to each other causally, it is now possible to decide algorithmically whether the assumptions are sufficient for identifying causal effects of interest, what covariates should be measured (or enter into a propensity score routine) and what the testable implications are of the model assumptions. “Completeness” proofs that accompany these results further assure investigators that no method can do better without resorting to stronger assumptions.

Another triumph of the symbiotic analysis has been the emergence of active research in nonparametric mediation problems, aiming to estimate the extent to which an effect is mediated by various pathways or mechanisms (e.g., Robins and Greenland, Pearl, Petersen and Van der Laan, VanderWeele, Imai). The importance of this analysis, aside from telling us “how nature works,” lies in policy evaluation, especially in deciding what nuances of a given policy are likely to be most effective. Mediation-related questions were asked decades ago by Fisher and Cochran but, lacking the tools of graphs and counterfactuals they could not be addressed until quite recently.

Recent works further show that causal analysis is necessary in applications previously thought to be the sole province of statistical estimation. Two such applications are meta-analysis and missing data.

The talk will focus on the following questions:

1. What every student should know about causal inference, and why it is not taught in Statistics 101. http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf

2. The Mediation Formula, and what it tells us about “How nature works” http://ftp.cs.ucla.edu/pub/stat_ser/r379.pdf

3. What mathematics can tell us about “external validity” or “generalizing across populations” http://ftp.cs.ucla.edu/pub/stat_ser/r372.pdf, http://ftp.cs.ucla.edu/pub/stat_ser/r387.pdf

4. When and how can sample-selection bias be circumvented http://ftp.cs.ucla.edu/pub/stat_ser/r381.pdf, http://ftp.cs.ucla.edu/pub/stat_ser/r405.pdf

5. What population data can tell us about unsuspected heterogeneity http://ftp.cs.ucla.edu/pub/stat_ser/r406.pdf

6. Why missing data is a causal problem, when parameters are estimable from partially observed data, and how http://ftp.cs.ucla.edu/pub/stat_ser/r406.pdf

Reference: J. Pearl, Causality (Cambridge University Press, 2000, 2009) Working papers: http://bayes.cs.ucla.edu/csl_papers.html