Institute of Mathematical Statistics | IMS Blackwell Lecture preview: Cun-Hui Zhang

IMS Blackwell Lecture preview: Cun-Hui Zhang

May 16, 2026

Cun-Hui Zhang, Distinguished Professor of Statistics at Rutgers University, is a Fellow of the Institute of Mathematical Statistics and of the American Statistical Association. His research interests include high-dimensional data, machine learning, empirical Bayes, time series, nonparametric methods, multivariate analysis, survival data and biostatistics, functional MRI, closed loop diabetes control, and network tomography.

This Blackwell Lecture will be delivered at the IMS 2026 meeting, in Salzburg, July 6–9, 2026. See the program at https://ims2026.github.io/IMS2026/program.html

Empirical Bayes for Dependent Data

Empirical Bayes is founded on a simple but powerful idea: when many related statistical decision problems are observed together, pooling information across them can substantially outperform procedures that treat each problem separately. Since its introduction more than 75 years ago, this principle has had a profound impact on statistical thinking and practice. Yet most of its classical development has focused on independent observations. In contrast, modern applications, ranging from spatial epidemiology to large-scale digital platforms, increasingly generate compound decision problems with dependent data. In such settings, the value of pooling is often evident, but the key challenge is how to do so effectively without detailed knowledge of the underlying dependence structure.

A key observation, present since the inception of empirical Bayes but often overlooked, is that the fundamental theorem connecting compound decision problems to the oracle Bayes formulation does not require independence. In this setting, the oracle prior is the empirical distribution of the unknown parameters across related decision problems, representing the ideal target for information pooling. Building on this observation, we develop a marginal likelihood framework for empirical Bayes under dependence.

In this marginal likelihood approach, the effect of dependence can be quantified through the largest eigenvalue of the correlation matrix of the data. This quantity serves as a discount factor on the effective sample size, measuring how much dependence reduces the usable information available for pooling. Importantly, for the nonlinear procedures we propose, the dependence conditions required are no stronger than those needed for linear estimators. Thus, substantial empirical Bayes gains remain achievable even in broad dependent settings.

For Gaussian mean estimation, the proposed approach achieves nearly parametric rates for estimating the score function, yielding regret relative to the oracle estimator of nearly reciprocal sample-size order. Parallel to Stein’s unbiased risk estimate (SURE), the method also provides an estimator of the compound risk with bias of nearly reciprocal sample-size order, enabling confidence regions for the full mean vector. Beyond point estimation, the marginal likelihood framework automatically provides consistent estimators of the oracle prior, posterior distributions, and Bayes credible intervals. Although the general convergence rates are logarithmic for these more challenging targets, the marginal likelihood method adapts to atomic sparsity in the oracle prior, achieving substantially faster nonparametric rates when the oracle prior has relatively few support points.

The framework is also robust to model misspecification. In the Gaussian setting, the theory requires only pairwise Gaussian assumptions rather than a full joint Gaussian model, broadening applicability when Gaussianity is only approximate. Beyond Gaussian models, the marginal likelihood approach also extends naturally to dependent Poisson observations, where the largest eigenvalue of the correlation matrix again serves as the effective samplesize discount factor.

Taken together, these results significantly broaden the scope of empirical Bayes, extending it from classical independent settings to modern dependent data while preserving much of its statistical efficiency. More broadly, they show that the power of information pooling remains robust under dependence, provided that dependence is properly quantified and controlled.