David Blei is the William B. Ransford Professor of Statistics and Computer Science at Columbia University, USA. He studies probabilistic machine learning and Bayesian statistics, including theory, algorithms, and application. David has received several awards, including the 2013 ACM Prize in Computing, a 2017 Guggenheim fellowship, 2019 Simons Investigator Award, and the 2024 ACM/AAAI Allan Newell Award. He was the co-editor-in-chief of the Journal of Machine Learning Research from 2019–24. He is a fellow of the Association for Computing Machinery (ACM) and the IMS. This lecture will be delivered at JSM 2026 in Boston, August 1–6, 2026.

 

A Fresh Look at Empirical Bayes

Empirical Bayes (EB) improves the accuracy of simultaneous inference “by learning from the experience of others” (Efron, 2012). This idea reflects a blend of Bayesian and frequentist thinking, which goes back to Robbins (1956). For decades, it has been an active area of productive statistical research. See Efron (2019) and Ignatiadis and Sen (2025) for modern reviews.

In this lecture, I will discuss three new ideas in empirical Bayes.

1 Empirical Bayes via probabilistic symmetries

Classical EB theory focuses on latent variables that are i.i.d. draws from a fitted prior. Many modern statistics problems, however, feature complex structure, like arrays, spatial processes, or covariates. How can we apply EB ideas to these settings?

In the first part of the talk, we describe a generalized approach to empirical Bayes based on the notion of probabilistic symmetry. Our method pairs a simultaneous inference problem—with an unknown prior—to a symmetry assumption on the joint distribution of the latent variables. Each symmetry implies an ergodic decomposition, which we use to derive a corresponding empirical Bayes method. We call this method Bayesian empirical Bayes (BEB). We show how to use this approach to extend EB with several probabilistic symmetries: (i) EB matrix recovery for arrays and graphs; (ii) covariate-assisted EB for conditional data; (iii) EB spatial regression under shift invariance. To solve the resulting computational problem, we present scalable algorithms based on variational inference and neural networks.

2 Empirical Bayes and simulation-based inference

Classical EB assumes that the likelihood is tractable, i.e., that we can calculate the conditional distribution of the data given the latent variable p(x|z). In many scientific applications, however, the likelihood is available only through a simulator.

In the second part of the talk, we discuss an EB approach for such implicit likelihoods. Our approach uses the idea of simulation-based inference (SBI) (Cranmer et al., 2020). Specifically, we show how to calculate EB estimates without an explicit density by using the observed data, simulator samples, and an amortized inference network. The idea is that the result of simulation-based inference provides a natural mechanism to approximate the “population posterior,” one form of the optimal EB prior. We demonstrate our method with several scientific simulators.

3 Empirical Bayes for combining experimental and observational data

Finally, as an application of EB thinking, we present a new method for simultaneously analyzing randomized trials and observational studies. Randomized experiments have long been the gold standard for scientists seeking to estimate a causal effect. When randomized experiments are limited, however, scientists often resort to observational studies for causal inference. Observational studies often come in large samples, but they rely on untestable assumptions and can be systematically biased. This leads to what Gerber et al. (2004) calls the illusion of learning from observational research: absent prior information about bias, observational results cannot meaningfully improve the quality of causal inference.

To shatter this illusion, we take an empirical Bayes perspective. We show that the distribution of observational biases can be learned from calibration studies–carefully designed studies in which the causal effect is known a priori to be zero. Calibration identifies the distribution of observational bias and allows observational studies to share meaningful information about the causal effect. We show that with an increasing number of calibration and observation studies, both the bias distribution and the causal effect can be consistently recovered. These ideas are joint work with Sebastian Salazar, Diana Cai, Don Green, Xinwei Shen, Sebastian Wagner-Carena, Bohan Wu, and Cheng Zhang.

References

Cranmer, K., Brehmer, J., and Louppe, G. (2020). The frontier of simulation-based inference. Proc. Natl. Acad. Sci. USA, 117(48):30055–30062.

Efron, B. (2012). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge Univ. Press.

Efron, B. (2019). Bayes, oracle Bayes, and empirical Bayes. Statist. Sci., 34(2):177– 201.

Gerber, A., Green, D., and Kaplan, E. (2004). The illusion of learning from observational research. In Shapiro, I., Smith, R., and Massoud, T., eds, Problems and Methods in the Study of Politics. Cambridge Univ. Press.

Ignatiadis, N. and Sen, B. (2025). Empirical Bayes: From Herbert Robbins to Modern Theory and Applications.

Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. Berkeley Symp. Math. Statist. and Probab., pages 131–148.