Jelle Goeman is professor of biostatistics at Leiden University Medical Center. He obtained his PhD at Leiden University in 2006 under supervision of Hans van Houwelingen and Sara van de Geer. His research interest is in high-dimensional data, in particular in testing and multiple testing problems that arise in such data, and he has made several theoretical and methodological contributions in this area. His frequent involvement in medical research, including genomics and neuroimaging, has made sure that novel methods align well with the needs of applied researchers. His focus in these areas has been to allow researchers to postpone as many analysis decisions as possible until after seeing the data.

This Medallion Lecture will be given at the IMS Annual Meeting (in Salzburg, July 6–9, 2026).

 

Principles and Flexibility in Multiple Testing

Designing a multiple testing procedure is generally a difficult task. However, when the goal of the method is to control the familywise error rate there is a general principle, the closure principle, that method designers can rely on. The closure principle says that every method controlling familywise error rate is either a special case of closed testing, or it can be uniformly improved by a closed testing procedure. This reduces the complex task of designing any multiple testing procedure to the task of designing a closed testing procedure. To design such a procedure, a researcher has to choose how to combine the evidence against any subcollection of the hypotheses of interest into a single p-value, and solve a discrete numerical optimization problem.

In this presentation we show how the closure principle can be generalized from just familywise error rate control to any error rate of interest. We show that any multiple testing procedure controlling any error rate can be written as a special case of a generalization of the closed testing procedure that uses e-values instead of p-values. To design a multiple testing procedure, a method designer therefore only needs to choose how to combine the evidence against each subcollection of the hypotheses of interest in an e-value, and solve a discrete numerical optimization problem.

The e-Closure principle is a very powerful tool for method design. It allows for relatively easy design of procedures for novel and exotic error rates. For example, if hypotheses come sequentially, a practitioner may accept one expected error per K consecutive hypotheses. Designing a valid multiple testing procedure for such an error rate is relatively easy with e-Closure.

Moreover, when we rebuild existing multiple testing procedures using e-Closure we often obtain strictly more powerful variants of these existing methods. This was known for familywise error rate, where e.g. Holm’s procedure can be obtained by rebuilding the Bonferroni procedure using closed testing. We demonstrate that such improvements are also possible with False Discovery Rate controlling methods, as we illustrate with the Benjamini–Yekutieli (BH) procedure and the e-BH method of Wang and Ramdas. The resulting improvements are typically more substantial than those obtained by Holm over Bonferroni.

Additionally, the e-Closure principle adds a novel aspect of flexibility to multiple testing. Classical multiple testing methods, such as Benjamini–Hochberg, return a single set of rejected hypotheses for which the error rate is controlled. Methods designed using e-Closure, in contrast, generally return a collection of such sets. Error control is simultaneous over all sets in the collection, implying that the researcher may use a data-driven choice of a rejected set or report several of them, and still control the error rate. This flexibility is very helpful in high-dimensional data contexts such as neuroimaging. In neuroimaging, null hypotheses correspond to voxels (3D pixels) in the brain. Rejected sets are therefore subsets of the brain. If such a subset consists of several unconnected clusters, false discovery control over the entire subset is not so interesting, but e-Closure can offer false discovery rate control for each cluster individually and simultaneously.

Finally, since e-Closure is a single principle that can be used for all error rates, it allows cross-error rate conclusions. In fact, the control offered by e-Closure is simultaneous over all error rates. This means that researchers may switch error rates after seeing the data, moving, e.g. from familywise error rate control to false discovery rate control if the amount of signal in the data is less than expected. This data-adaptive flexibility even extends, in some cases, to a post hoc choice of the α-level at which the error rate is controlled.