Bodhi Sen is Professor and Chair of Statistics at Columbia University, New York. He completed his PhD in Statistics from the University of Michigan, Ann Arbor, in 2008. Prior to that, he was a student at the Indian Statistical Institute, Kolkata, where he received his Bachelors (2002) and Masters (2004) in Statistics. His core statistical research centers around nonparametrics—function estimation (with special emphasis on shape constrained estimation), theory of optimal transport and its applications to statistics, empirical Bayes procedures, kernel methods, likelihood and bootstrap based inference, etc. His honors include the NSF CAREER award (2012) and the Young Statistical Scientist Award (YSSA) in the Theory and Methods category from the International Indian Statistical Association (IISA). He is an IMS Fellow. This Medallion Lecture will be given at the IMS Annual Meeting (in Salzburg, July 6–9, 2026).
Wasserstein–Cramér–Rao Theory of Unbiased Estimation
Statistics has a long romance with geometry. As a parameter varies, a model traces a path through probability distributions, and classical theory measures that path using the Fisher–Rao (Rao, 1945) geometry, closely tied to the Hellinger distance (Amari and Nagaoka, 2000). The Fisher information is the metric that this local Hellinger geometry induces on parameter space, while differentiability in quadratic mean (van der Vaart, 1998) expresses the smoothness of the resulting path of distributions. From this viewpoint, the key step in the standard proof of the Cramér–Rao inequality — what a statistician might regard simply as differentiation under the integral sign — can be understood as a consequence of the absolute continuity of the model in that geometry, a theme that will reappear throughout the talk.
The classical theory of unbiased estimation is organized around variance, which measures the instability of an estimator across independent resamples from the same distribution. That is a natural benchmark, but it is not the only one. For an estimator T(X1, …, Xn), we propose a different measure of instability, which we call the sensitivity of T: 𝕊(T) = E[∑ni=1 || ∇xiT(X1, …, Xn)||2]. This quantity measures the total expected effect on the estimator T of infinitesimal perturbations of the sample points. In problems with moving support, boundary effects, or small data perturbations, this kind of stability can matter just as much as variance.
Once sensitivity becomes the quantity of interest, Wasserstein geometry enters naturally. In this lecture I revisit three classical achievements—the Cramér–Rao lower bound, exact efficiency in exponential families, and asymptotic efficiency of maximum likelihood—and ask: what becomes of this picture when Fisher–Rao/Hellinger geometry is replaced by Wasserstein geometry?
A parallel theory emerges. In much the same way that variance is tied to Fisher–Rao/Hellinger geometry, sensitivity is tied to Wasserstein geometry. There is a Wasserstein–Cramér–Rao lower bound, which gives a universal lower bound on the sensitivity of unbiased estimators in terms of a Wasserstein information matrix (Li and Zhao, 2023). The bound applies to models that are absolutely continuous in Wasserstein space, or, equivalently, whose local evolution is described by the continuity equation (Ambrosio, Gigli, and Savaré, 2008). An exact-efficiency theory follows from the equality case of Cauchy-Schwarz, giving rise to transport families—the Wasserstein analogue of exponential families. And there is an asymptotically sensitivity-efficient estimation strategy: the Wasserstein projection estimator, obtained by projecting the empirical measure onto the model in Wasserstein distance, just as maximum likelihood may be viewed as a kind of KL projection.
One simple example where the theory becomes especially vivid is for the family Uniform[0, ϑ], ϑ > 0. This model lies outside the usual classical Cramér-Rao framework because its support changes with ϑ, so the standard score-based regularity condition fails. In Wasserstein geometry, however, the family is absolutely continuous, and one can derive a meaningful lower bound on the sensitivity of any unbiased estimator of ϑ. The example exposes a sharp tension between variance and sensitivity. The maximum likelihood estimator, namely the sample maximum, has excellent variance but is highly sensitive to infinitesimal perturbations: a single nudge to the largest observation moves the estimate by the same amount. By contrast, the Wasserstein projection estimator has an explicit form as a weighted average of order statistics and is asymptotically sensitivity-efficient.
More broadly, the lecture is an invitation to rethink efficiency geometrically. Variance is a powerful notion of instability, but it is not always the one most relevant to modern inferential problems. When stability under small perturbations is itself part of what we want from an estimator, Wasserstein geometry offers a natural alternative framework.
This is joint work with Nicolás García Trillos (University of Wisconsin–Madison) and Adam Jaffe (Columbia University), based on our paper “Wasserstein–Cramér–Rao Theory of Unbiased Estimation” (García Trillos, Jaffe and Sen, 2025).
References
Amari, S, and Nagaoka, H. (2000). Methods of Information Geometry. Translations of Mathematical Monographs, Vol. 191. AMS/OUP
Ambrosio, L, Gigli, N, and Savaré, G. (2008) Gradient Flows in Metric Spaces and in the Space of Probability Measures. 2nd edn, Birkhäuser
García Trillos, N, Jaffe, A, and Sen, B. (2025) Wasserstein-Cramér-Rao Theory of Unbiased Estimation. Preprint arXiv:2511.07414
Li, W, and Zhao, J. (2023) Wasserstein information matrix. Information Geometry, 6:203–255. doi:10.1007/s41884-023-00099-9
Rao, CR. (1945) Information and the accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37:81–91
van der Vaart, AW. (1998) Asymptotic Statistics. Cambridge University Press