Adel Javanmard is an Assistant Professor in the Department of Data Sciences and Operations, at the University of Southern California’s Marshall School of Business. Prior to joining USC in 2015, he was a postdoctoral research fellow at the Center for Science of Information, working at UC Berkeley and Stanford University. He completed his PhD in electrical engineering at Stanford, advised by Andrea Montanari; and BSc degrees in electrical engineering and pure math at Sharif University of Technology, Iran. His research interests are in the area of high-dimensional statistics, machine learning, optimization, and personalized decision-making. Adel is the recipient of several awards and fellowships, including the NSF CAREER award, Google Faculty Research award, and the Thomas Cover Dissertation Award from IEEE society. He was a silver medalist in the International Mathematical Olympiad.
This Tweedie Award Lecture is due to be given at the IMS New Researchers Conference, in Philadelphia, July 29–August 1, 2020.
Statistical Inference for High-Dimensional Models
The past two decades have witnessed a rapidly growing literature on high-dimensional statistics, where the sample size n can be smaller than p, the number of covariates. High-dimensional models are de rigueur nowadays, as they lend themselves well to modern high-
volume and fine-grained datasets. In particular, remarkable progress has been achieved on optimal point estimation and efficient computation for such models. However, the fundamental problem of statistical inference, in terms of frequentist confidence intervals and hypothesis testing, is much less developed. This problem is of crucial importance in modern data analysis; on the one hand, statistical learning methods help researchers discover unexpected patterns from data and to make better decisions impacting everyday life. On the other hand, the size of datasets as well as the complexity of the methods used has made statistical models less transparent. Employing the derived models without a proper understanding of their validity can lead to many false discoveries, incorrect predictions and massive costs when they are used as the basis for policy design and decision making. This is also intimately related to reproducibility of the discoveries and results. Practitioners would like to know if the findings in a study can be replicated in another study under the same conditions, not exactly but up to statistical error.
In the past couple years, significant progress has been made in performing valid statistical inference on low-dimensional components of high-dimensional models, such as testing the significance of each individual model parameter. A formidable challenge along this way is that fitting high-dimensional complex models often requires the use of non-linear and non-explicit parameter estimation procedures (such as neural networks) and despite the classical regime, it is notoriously hard to characterize the probability distribution of such estimates. Furthermore, point estimators in high-dimensions are necessarily biased, since they are produced from data in lower dimensions.
A popular approach to tackle this problem is via a novel method called debiasing [1,2,3]. The idea is to start with a regularized estimator that enjoys a low estimation error rate, and then move it in a direction that compensates for its bias, of course at the cost of adding noise. The debiasing approach aims at finding the optimal debiasing direction and controls variance and bias of the resulting estimator at the same time [1]. In this lecture, I will discuss some of the major extensions and methodological developments that rely on the debiasing approach. In particular, (i) a flexible framework for general hypothesis regarding model parameters [4]: this encompasses testing whether the parameter lies in a convex cone, testing the signal strength, and testing functionals of the parameter, as examples. (ii) online debiasing [5]: Adaptive collection of data is increasingly commonplace in various applications. This adaptive data collection induces correlation in samples and bias in the estimates, posing additional obstacles to statistical inference. I will introduce “online debiasing” to overcome these problems and discuss its applications in time series analysis.
References:
1. Adel Javanmard and Andrea Montanari, 2014. “Confidence Intervals and Hypothesis Testing for High Dimensional Regression,” J. Machine Learning Research, 15(1): 2869-2909,
2. Cun-Hui Zhang and Stephanie S Zhang, 2014. “Confidence intervals for low dimensional parameters in high dimensional linear models”, J. Royal Statistical Society, Series B 76 (1): 217–242
3. Sara van de Geer, Peter Bühlmann, Ya’acov Ritov, and Ruben Dezeure, 2014.“On asymptotically optimal confidence regions and tests for high-dimensional models”, Annals of Statistics 42(3): 1166–1202
4. Adel Javanmard and Jason D. Lee, 2020.“A Flexible Framework for Hypothesis Testing in High-dimensions,” J. Royal Statistical Society, Series B, forthcoming
5. Yash Deshpande, Adel Javanmard and Mohammad Mehrabi, “Online Debiasing for Adaptively Collected High-dimensional Data”, Preprint arXiv: 1911.01040.