Richard Samworth obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a full professor in 2013 and the Professor of Statistical Science in 2017. His main research interests are in nonparametric and high-dimensional statistics; he has developed methods and theory for shape-constrained inference, missing data, subgroup selection, data perturbation techniques (random projections, subsampling, the bootstrap, knockoffs), changepoint estimation and in dependence testing. Richard currently holds a European Research Council Advanced Grant. He received the COPSS Presidents’ Award in 2018, was elected a Fellow of the Royal Society in 2021 and served as co-editor of the Annals of Statistics (2019–21). Richard’s Wahba Lecture will be at JSM Nashville, August 2–8, 2025.
Nonparametric inference under shape constraints: past, present and future
Traditionally, we think of statistical methods as being divided into parametric approaches, which can be restrictive, but where estimation is typically straightforward (e.g. using maximum likelihood) and nonparametric methods, which are more flexible but often require careful choices of tuning parameters. Nonparametric inference under shape constraints sits somewhere in the middle, seeking in some ways the best of both worlds. The origins of the field date back to Grenander in 1956, who proved that there exists a unique maximum likelihood estimator of a decreasing density on the non-negative half-line (and was able to characterise it explicitly). Thus, even though the class of densities is infinite-dimensional, statistical estimation can proceed in a familiar fashion, with no tuning parameters to choose. I’ll discuss some of the key properties of the Grenander estimator.
Fast forward fifty years, and attention turned to the family of log-concave densities, i.e. those densities f for which log f is concave. This definition works equally well in d dimensions as in the univariate case, and moreover the class is closed under marginalisation, conditioning, convolution and linear transformations, making it a very natural infinite-dimensional generalisation of the class of Gaussian densities. In the 2000s, it therefore became a central class within the area. Once again, a unique maximum likelihood estimator exists, but since its characterisation is now less explicit, considerable research effort has been devoted to its efficient computation.
The period from roughly 2010 to the early 2020s saw rapid and exciting developments in our understanding of log-concave density estimation and related shape-constrained estimation problems such as isotonic (i.e. monotone) or convex regression. For instance, it is now known that shape-constrained estimators can often not only achieve optimal minimax rates of convergence, but also achieve faster (sometimes even parametric) rates of convergence over subclasses of densities with additional structure. Moreover, there are some fascinating connections with information theory, and in particular the theory of entropy-maximising distributions.
The last couple of years have witnessed a significant broadening of the scope of shape constrained ideas and techniques, moving beyond vanilla density estimation and regression problems, so that they are now incorporated as part of more elaborate tasks. I will give three brief illustrations, in problems of subgroup selection, conditional independence testing and linear regression. Subgroup selection refers to the post-selection inference challenge, common in clinical trials, where we observe covariate-response pairs, and seek to identify a subset of the covariate domain on which our (isotonic) regression function exceeds a pre-specified threshold. In conditional independence testing, the fundamental hardness result of Shah and Peters in 2020 showed that when Z is continuous, no test of the conditional independence of X and Y given Z can have power greater than its size at any alternative; shape constraints provide a natural way to restrict the null to facilitate viable tests. Finally, in linear regression, we show that one can employ antitonic score matching to obtain an estimator achieving optimal asymptotic variance among all convex M-estimators.
One take-home message from these vignettes is that shape constraints often provide a natural alternative to smoothness conditions in nonparametric problems.
I will conclude with some open problems in the area.