Gerda Claeskens is professor of statistics at KU Leuven in Belgium. She obtained her PhD degree in 1999 from the University of Hasselt. Before joining KU Leuven in 2004, she was a faculty member of the statistics department at Texas A&M University. Her main research interests include model selection, nonparametric estimation and testing, and inference after model selection. In 2008 she co-authored a book on the topic of model selection and model averaging with N.L. Hjort. Dr. Claeskens received the Noether Young Scholar Award in 2004, and was elected IMS fellow in 2012. She serves as associate editor for several journals, including the Annals of Statistics and Biometrika. Gerda’s Medallion Lecture will be delivered at this year’s JSM, which takes place in Chicago from July 30–August 4. See the preliminary program at http://www.amstat.org/meetings/jsm/2016/program.cfm
Model averaging and post-model selection
Several estimators are often available for a single population quantity. Examples include estimators of the mean of the response in a multiple linear regression model when submodels are considered that include only subsets of the set of available covariates. A “model averaged estimator” results when such estimators constructed from different models are used in a weighted average, which is to be used as a single estimator of the population quantity of interest. Theoretical properties of such a weighted estimator may be very simple when all estimators are independent and when the weights are deterministic. More complicated situations arise with correlated estimators, often based on the same set of data, as typically happens in variable selection problems, and when the weights are random too. Random weights are frequently in use. Consider for example the use of a variable selection criterion such as Akaike’s information criterion (AIC) which assigns weight one to the estimator obtained using the model with the best AIC value and gives weight zero to all other estimators from the non-selected models. Since the AIC value is computed from the data, the resulting weight is, obviously, random.
The process of model averaging gives rise to several interesting problems. Several choices have to be made in order to construct a model averaged estimator such as, “which and how many estimators will be averaged over?” and, “which weights will be used?”
Data-driven frequentist weights can be chosen by minimizing an estimator of a mean squared error expression. In general there might not be a unique set of such weights, meaning that the resulting weighted estimators might not be identical for different values of the weights, and also the value of the weights should be interpreted with care. We obtain that there are multiple weight vectors which yield equal model averaged predictions in linear regression models. In particular, a restriction to the so-called singleton models, where each model only includes one parameter, results in a drastic reduction in the computational cost.
If the fact that the weights are random variables rather than fixed numbers is taken into account already while selecting the weights, different values of the “optimal” weights are found, as compared to starting with fixed weights. In particular, we show that the model averaged estimator is biased even when the original estimators are unbiased and that its variance is larger than in the fixed weights case. This relates to the “forecast combination puzzle,” which finds that there is no guarantee that the optimally weighted averaged forecast will be better than the equally-weighted case or even improve on the original forecasts.
The distribution of model averaged estimators is, in general, hard to obtain. We work out the special case of an estimator after model selection by the Akaike information criterion, AIC. All but one of the random weights are zero, only for the selected model the weight equals one. For AIC selection we obtain an asymptotic selection region that expresses when a certain model is selected. We exploit the overselection properties of AIC to construct valid confidence regions that take the model selection uncertainty into account. While the asymptotic distributions of estimators-post selection are typically no longer normal, the particular form of the AIC allows us to use simulation to obtain asymptotic quantiles for use in confidence regions.
This research opens perspectives for other post-selection inference, as well as for the more general model averaging inference.
The presented work has been obtained jointly with A. Charkhi, B. Hansen, J. Magnus, A. Vasnev and W. Wang.
More Previews in forthcoming issues
The 2016 IMS Named and Medallion lectures are at: ENAR (Austin, March 6–9), the World Congress [WC] (Toronto, July 11–15) and JSM (Chicago, July 30–August 4). Speakers are:
- Gerda Claeskens [Medallion (ML) @ JSM]
- Pierre del Moral [ML @ WC]
- Frank Den Hollander [ML @ WC]
- Vanessa Didelez [ML@ WC]
- Peter Diggle [ML @ ENAR]
- Arnaud Doucet [ML @ WC]
- Christina Goldschmidt [ML @ WC]
- Scott Sheffield [Doob @ WC]
- Sara van de Geer [Wald @ WC]
- Nanny Wermuth [ML @ JSM]
- Bin Yu [Rietz WC]
- Ofer Zeitouni [Schramm @ WC]
Stay tuned for more previews of their lectures in the coming months.