Million-dollar Rousseeuw Prize for Statistics awarded to Yoav Benjamini, Daniel Yekutieli and Ruth Heller

An international and independent jury, appointed by the King Baudouin Foundation, has selected the pioneering work on the False Discovery Rate (FDR) as the recipient of the prestigious biennial Rousseeuw Prize for Statistics 2024. This million-dollar prize honors exceptional statistical research that profoundly influences society. The inaugural prize in 2022 celebrated advancements in causal inference. This year’s award focuses on the False Discovery Rate (FDR) and Methods to Control It. The 1995 paper by Benjamini and Hochberg introduced FDR, providing a framework for further expansion and publications. The laureates of the prize are Yoav Benjamini, Daniel Yekutieli, and Ruth Heller from Tel Aviv University. Yosef Hochberg also deserves much recognition, but sadly is no longer alive.

The laureates’ research has led to a method to limit the number of false discoveries without stifling the potential for true discoveries. The FDR work being honored has created new concepts and methodologies to help scientists find real discoveries among many possible results, while keeping the error from false discoveries low. This need arises from the fundamental problem in science that any conclusion from the analysis of data is prone to uncertainty. Therefore, thresholds were adopted for commonly used statistical methods such as “statistical significance,” “p-value,” and “confidence intervals” for determining the statistical validity of a single discovery. By the middle of the 20th century, concern was raised that the usual thresholds for a statistical discovery lose their meaning when comparing multiple groups and selecting the most promising differences with the usual thresholds, after viewing the data. Thus the level for tests of the individual hypotheses should be lowered. The new individual thresholds were determined so that the probability of even one false discovery among the many, known as the family-wise error rate, is controlled at an acceptable level. Extensive research in this field of Multiple Comparisons was devoted to methods that keep this requirement, but their ability to find discoveries was drastically reduced when the number of potential discoveries is large. Therefore, in many areas of science their use was very partial, and in others denied. In areas such as genomics, where the multiplicity of results screened for discoveries became increasingly large and the danger from ignoring the effect of selection is very apparent, the question whether or not to control for multiplicity was discussed as the choice between Plague and Cholera.

In 1989 Branko Soriç warned that by ignoring the multiple testing issue and focusing on selected statistical discoveries, “a large part of science may be false.” Reading this paper, Benjamini and Hochberg realized that the proportion of false discoveries among the statistical discoveries may serve as a criterion for selection, rather than merely a warning. Intuitively, in a study with 100 genes, if 60 association discoveries are made and three are false, that is bearable; however, if only five discoveries are made and three are false, it is clearly unacceptable. The same logic holds even if a thousand or a million potential discoveries are screened. Benjamini and Hochberg formulated the criterion mathematically as the expectation of the proportion of false discoveries, and called it the False Discovery Rate.

In that same paper they offered a simple method that controls the FDR using marginal p-values, known as the Benjamini–Hochberg procedure (BH). The threshold for discoveries adapts to the data at hand: it can be as low as the control of the family-wise error rate requires if very few potential discoveries are apparent, but may be as permissive as ignoring multiplicity altogether when many clear discoveries exist.

The paper encountered serious objections from reviewers because it did not fit the family-wise error rate paradigm, so only three journals and five years later the first part was published (1995). The second part, which offered an even more adaptive method by estimating the number of null hypotheses, appeared only in 2000, ten years after it was first submitted. In later work Benjamini and Hochberg further allowed the potential discoveries to carry different weights that express varying importance, by incorporating the weights in the definition of FDR and in the BH procedure.

The work of Benjamini and Yekutieli (2001) extended the theoretical foundation of the BH procedure, allowing its use in the important setting of positively dependent statistics. They also suggested a modification of the procedure that allows one to use the BH procedure under any dependency, sometimes referred to as the Benjamini–Yekutieli procedure. Together with Abba Krieger they gave a theoretical foundation for it.

In later work Benjamini and Yekutieli proposed the False Coverage Rate (FCR) criterion for selected confidence intervals, and suggested such confidence intervals for a general class of selection rules. When the selection is according to the BH procedure, the conclusions from their intervals match those of the BH procedure. Equally importantly, that work clarified the common concept in testing and confidence intervals: Given a selection procedure based on the observed data, the statistical guarantees relevant for one discovery, such as coverage or type I error, are to hold on the average over the selected discoveries. As challenges in more complex behavioral genomic research were encountered, this general point of view was adapted to hierarchical testing of trees of hypotheses by Yekutieli, Benjamini, and later, Marina Bogomolov.

Entering the current century, science went through an industrialization process. Experiments in genomics or proteomics are done by high-throughput machines, and the outcomes are processed automatically, resulting in many potential discoveries. An important area where technical developments increased the size of potential discoveries is brain research. In functional brain imaging (fMRI), thousands of locations in the brain are tested for association with some thought process, say, recognizing an upside-down face. However, interest lies with discoveries of active regions rather than individual locations. Heller and Benjamini developed the theoretical and practical means to identify active regions while controlling the FDR of regions. In genomic research, the number of genes whose differential expression was studied reached tens of thousands, and the number of locations on the genome is in the millions. This field was initially prone to non-replicable discoveries, and again works of Heller and Benjamini, and then with Yekutieli, paved the way for methods to control the proportion of non-replicable results among those declared.

Due to advances in computing and databases, the FDR is increasingly being used in many other fields of science such as agriculture, astronomy, behavioral science, economics and so on.

Motivated by wavelet analysis, joint work of Benjamini with Abramovich, Donoho and Johnstone demonstrated how FDR and the BH procedure are relevant and even asymptotically optimal for the estimation of sparse signals. Jointly with Gavrilov, they showed their relevance for model selection in linear regression. Also model selection methods relying on the FDR criterion but not involving the BH procedure were developed, such as the knockoff.

Outside of their collaboration with Benjamini, Yekutieli and Heller have made contributions to FDR research and more generally to the area of multiple testing. Yekutieli exposed the difference between situations where the Bayes argument excuses ignoring the multiplicity problem and when it does not, and offered adjusted Bayesian inference for the latter. The FCR criterion has further been adapted in conformal inference as the criterion to control for selected prediction intervals, and the central role of the BH procedure in the selection of informative prediction intervals has been highlighted in recent work by Gazin, Heller, Marandon, and Roquain. Heller and Yekutieli have continued to explore big replicability challenges. Heller worked on hierarchical FDR testing involving conditional tests, and recently Heller and Rosset offered an optimal FDR controlling procedure.

As it turns out, FDR-related concepts are also relevant to other schools of thought in statistics such as empirical Bayes, led by Brad Efron and his collaborators. Research is still very active, attracting many other researchers, and its importance increases together with the complexity of the scientific questions being asked. All three laureates have continued to act separately, jointly, and with other researchers, to expand the FDR related methodologies to cater to the emerging needs of the scientific community.

The international jury appointed by the King Baudouin Foundation selected the winners from the nominations received, after a widely advertised call earlier this year. The jury consisted of its chair David Hand (Imperial College), Lutgarde Buydens (Radboud U Nijmegen), Probal Chaudhuri (Indian Statistical Institute), Roger Koenker (U of Illinois), Steve Marron (U of North Carolina), Cynthia Rudin (Duke U), Louise Ryan (U of Technology Sydney), David Steinberg (Tel Aviv U) who abstained due to proximity to the laureates, Maria-Pia Victoria-Feser (U of Bologna), and Huixia Judy Wang (George Washington U). For more information on the prize see www.rousseeuwprize.org.