Trevor Hastie

Trevor Hastie is the John A Overdeck Professor of Statistics at Stanford University. Prior to joining Stanford University in 1994, he worked at AT&T Bell Laboratories for nine years, where he helped develop the statistical modeling environment popular in the R computing system. He received a BSc (Hons) in statistics from Rhodes University in 1976, an MSc from the University of Cape Town in 1979, and a PhD from Stanford in 1984. In 2018 he was elected to the National Academy of Sciences. Trevor’s main research contributions have been in applied statistics, particularly in the fields of data statistical modeling, bioinformatics and machine learning; he has published over 200 articles and written six books in this area: Generalized Additive Models (with R. Tibshirani, 1991), Elements of Statistical Learning (with R. Tibshirani and J. Friedman, 2001; 2nd edn 2009), An Introduction to Statistical Learning, with Applications in R (with G. James, D. Witten and R. Tibshirani, 2013), Statistical Learning with Sparsity (with R. Tibshirani and M. Wainwright, 2015) and the IMS Monograph, Computer Age Statistical Inference (with Bradley Efron, 2016). He has also made contributions in statistical computing, co-editing (with J. Chambers) a large software library on modeling tools in the S language (Statistical Models in S, 1992), which form the foundation for much of the statistical modeling in R. His current research focuses on applied statistical modeling and prediction problems in biology and genomics, medicine and industry. Trevor’s Wald Lectures will be delivered at JSM Denver, July 27–August 1, 2019.

Statistical Learning with Sparsity

This series of three talks takes us on a journey that starts with the introduction of the lasso in 1996 by Rob Tibshirani, and brings us to date on some of the vast array of applications that have emerged. In 2015 I published a research monograph by the same name with Rob Tibshirani and Martin Wainwright (Statistical Learning with Sparsity; the Lasso and Generalizations, Hastie, Tibshirani, Wainwright, Chapman and Hall, 2015). These talks will focus on some of the topics from this book.

The community of people that have worked on sparsity and high-dimensional statistical inference is by now very large (the lasso paper alone has over 28K citations!) My work with my colleagues and students has concentrated on applied methodology, and in particular algorithms and software for employing these powerful tools. All the applications I present are accompanied by software (mostly in R) that my students and I actively support and improve.

There are three Wald lectures, and they focus on different applications.

Wald Lecture I:

I motivate the need for sparsity with wide data, and then chronicle the invention of lasso and the quest for good software. After some early starts, my colleagues and I have settled on an algorithm known as coordinate descent, which is surprisingly efficient for fitting a sequence or path of sparse models. Along with our so-called strong rules for hedging the active set, our glmnet package in R (also python and matlab) has remained popular. Several examples will be given, culminating with a special adaptation of glmnet called snpnet for fitting lasso models for polygenic traits using GWAS (truly massive data). I end with a survey of some active areas of research not covered in the remaining two talks.

Wald Lecture II:

With real applications, we often encounter missing data, typically regarded as a nuisance. Depending on the application, we have different ways of sweeping the problem under the rug, some more natural than others. With principal components and the SVD, there is a natural way of accommodating NAs, which appears to have been in the statistical folklore for a long time. Matrix completion re-emerged during the Netflix competition as a way to compute a low-rank SVD in the presence of a large amount of missing data, and for imputing missing values. I discuss some aspects of this problem, and describe several algorithms for finding a path of solutions. Here sparsity comes in two forms: sparsity in the entries in the observed matrix, and sparsity in the singular values of the solutions. I illustrate with applications in a variety of areas, including recommender systems and the modeling of sparse longitudinal multivariate data.

Wald Lecture III:

As the sparsity literature has progressed over the years, some ingenious extensions have been proposed. One of these is the group lasso (Yuan and Lin, 2007 JRSS B), which selects for groups of variables. I briefly outline three projects that have employed these ideas; two concerning generalized additive model selection, and one for selecting interactions in a linear model. Then, in a different direction, the graphical lasso builds sparse inverse covariance matrices to capture the conditional independencies in multivariate Gaussian data. I discuss this approach and extensions, and then illustrate its use for anomaly detection and imputation with high-dimensional data.