Rina Foygel Barber is the Louis Block Professor of Statistics at the University of Chicago. Her research focuses on problems in distribution-free inference, multiple testing, sparse and low-rank methods, and medical imaging. She received an M.S. in Mathematics in 2009 and a PhD in Statistics in 2012 from University of Chicago, and was a postdoctoral research fellow at Stanford University in 2012–13 before joining the faculty at University of Chicago in 2014. She is the recipient of a Sloan Research Fellowship (2016), an NSF CAREER award (2017), the Tweedie New Researcher Award (2017), the Peter Gavin Hall IMS Early Career Prize (2020), and the COPSS Presidents’ Award (2020).
Rina’s Medallion Lecture will be given at the IMS London meeting in June: see
Distribution-free prediction: exchangeability and beyond
Distribution-free prediction is a recently developed field in statistics that seeks to provide predictive inference for the output of any estimation algorithm, without requiring assumptions on the data distribution or on the underlying algorithm. Given a regression algorithm that produces a fitted model µˆ(x) to predict Y | X = x, the goal is to construct a prediction interval that is valid (i.e., has a 1−α coverage probability) regardless of the algorithm we used for fitting µˆ, and regardless of whether the model µˆ actually fits well to the data distribution. Conformal prediction, pioneered by Vladimir Vovk and collaborators beginning in the late 1990s, provides exactly this type of guarantee. It can be paired with any model fitting algorithm to provide distribution-free predictive coverage, as long as the training and test data are drawn from the same distribution.
In this talk, I will describe two recent extensions to the conformal prediction framework. First, I will present the jackknife+ and CV+ procedures. Traditionally, cross-validation based prediction intervals (in particular, leave-one-out cross-validation, which is also known as the jackknife) generally provide the right coverage level empirically, but theoretical guarantees for these methods require an assumption of algorithmic stability (namely, the predicted value µˆ(Xn+1) is stable to perturbations of the training data, e.g., removing one training point). In our work, we propose a modification to this method called the jackknife+, or more generally K-fold CV+ as a modification to K-fold cross-validation, which is closely related to the cross-conformal predictors of Vovk and collaborators. Interestingly, the jackknife+ is guaranteed to provide 1−2α coverage in the worst case (rather than the target level 1−α), but under stability conditions, both jackknife and jackknife+ result in ≈ 1−α coverage. This method offers a compromise between conformal prediction, which may be computationally infeasible in many large-scale settings, and hold-out set methods (i.e., split conformal prediction), which are highly computationally efficient but result in wider prediction intervals due to data splitting.
Like conformal prediction, the jackknife+ relies on the assumption that the training and test data are drawn from the same distribution—that is, the training and test data points are exchangeable. Moreover, both methods assume that the regression algorithm used for fitting µˆ, while arbitrary, must treat the training data points symmetrically. In some applications, both of these conditions may be too restrictive — we may suspect that our data is not exchangeable due to phenomena such as distribution drift, and moreover we may also wish to fit models µˆ using regression methods that do not treat data points symmetrically in order to correct for this potential drift. The second extension I will describe is a new framework for non- exchangeable conformal methods (including non-exchangeable versions of conformal prediction, split conformal prediction, and the jackknife+), where both of these assumptions are relaxed. First, a small randomization step in the method allows for regression algorithms that are not symmetric in the training data points i = 1, …, n, with no resulting loss of coverage if the data points are indeed exchangeable. Next, if in fact the data points are not exchangeable (e.g., due to distribution drift over time), then placing weights on the n training data points before running the method enables the final prediction interval to be robust to these changes in the distribution. In combination, these two new properties allow for conformal type methods that can be deployed far beyond the exchangeable regime, using non-symmetric algorithms for more accurate estimation in distribution drift type settings, and providing robust predictive validity guarantees in the non-exchangeable setting as well.
This work is joint with Emmanuel Candès, Aaditya Ramdas, and Ryan Tibshirani.