Ying Jin is currently a Wojcicki Troper Postdoctoral Fellow at Harvard Data Science Initiative, working with Professors José Zubizarreta and Marinka Zitnik at Harvard Medical School. Her research centers around conformal prediction, distribution-free inference, generalizability, distribution shifts, selective inference, and their applications in biomedical discovery and human decisions. In Fall 2025, she will join the University of Pennsylvania as an Assistant Professor of Statistics and Data Science at the Wharton School. She obtained her PhD in Statistics from Stanford University in 2024, advised by Professors Emmanuel Candès and Dominik Rothenhäusler. Prior to that, she obtained a B.S. in Mathematics from Tsinghua University in 2019. Ying’s talk will be in the Brown Awards session at JSM Nashville, August 2–8, 2025.

 

Model-free selective inference with conformal prediction

Artificial Intelligence (AI) has revolutionized decision-making and scientific discovery in fields like drug discovery, marketing, and healthcare. To ensure the reliability of AI models in high-stakes scenarios, uncertainty quantification methods such as conformal prediction aim to build prediction sets covering unknown labels of new data to quantify the confidence in predictions from these models. These methods typically provide on-average (marginal) guarantees which, despite being useful, can be insufficient in decision-making processes that usually come with a selective nature. For instance, in drug discovery, practitioners are often interested in identifying a subset of promising drug candidates rather than assessing an “average” instance.

This talk introduces Conformal Selection, a novel framework that offers selective inference capabilities to conformal prediction to address these challenges. We primarily focus on applications where predictions from black-box models are used to shortlist unlabeled test samples whose unobserved outcomes satisfy a desired property, such as identifying drug candidates with high binding affinities to a disease target in early stages of drug discovery (virtual screening). In drug discovery, conformal prediction has been applied to build prediction intervals for the unknown labels of new drug candidates, which are then used to identify promising ones before costly experimental validations. However, these approaches neglect the selection bias that may occur in such data-driven decisions. The proportion of false leads in shortlisted drug candidates is typically much higher than the nominal level for an average candidate, incurring unwanted waste of resources in subsequent investigations. This issue is the well-recognized winner’s curse in classical statistical inference.

Conformal Selection allows the use of any black-box prediction model to identify unlabeled samples whose unobserved outcomes exceed user-specified values, while controlling the average proportion of falsely selected units (FDR). Leveraging a set of labeled data that are exchangeable with the unlabeled test points, our method constructs conformal p-values that quantify the confidence in unobserved large outcomes for each test sample. It then uses the Benjamini–Hochberg (BH) procedure to determine a data-dependent threshold for the p-values as a criterion for making confident selections. We show that even though the conformal p-values are dependent, as they rely on the same set of labeled data, their favorable positive dependence enables finite-sample, distribution-free FDR control. We show that in several drug discovery tasks, our methods narrow down the drug candidates to a manageable size of promising ones while controlling the proportion of false leads.

The most important assumption in Conformal Selection is that the test data must be exchangeable with the labeled data. However, in real scientific discovery and decision-making problems, new data often differs from those in the training set. For instance, new drug candidates may have distinct scaffolds than the known drugs. To address this challenge, we further introduce a Weighted Conformal Selection procedure. Assuming a covariate shift between unlabeled test samples and labeled training data, it builds weighted conformal p-values that remain valid for testing a single large outcome under the covariate shift. However, we prove that the favorable positive dependence among these p-values no longer holds. We then develop a new multiple testing procedure that calibrates individual selection thresholds for these p-values to ensure finite-sample FDR control. We also discuss certain robustness properties of the procedure when the covariate shift is estimated from data.

We apply Weighted Conformal Selection to several biomedical discovery tasks with realistic distribution shifts, using the hidden embeddings from deep learning prediction models as covariates. We demonstrate that Weighted Conformal Selection achieves FDR control while effectively adjusting for distribution shifts that arise from scaffold splitting of small molecules, temporal shifts in clinical experiments, synthetic sampling with generative AI models, and protein design with mutant revisions.

This is based on my PhD work with Emmanuel Candès.