Anirban DasGupta writes:
PhD students in statistics departments across the world are asked to take a course on the core theory of inference, the so-called “qualifier theory course”. I took mine at the ISI in Calcutta in 1977. It was masterfully taught by K.K. Roy, and covered what was essentially globally regarded as the core of inference, exponential families, sufficiency, ancillarity, completeness and UMVU, MLEs, Fisher information and Cramér-Rao, asymptotics of MLE, consistency, delta and Slutsky’s theorem, NP lemma, MP and UMP tests, MLR families, LRT, UMA confidence sets, duality of confidence and testing, basic game theory, Bayes rules, admissibility, minimaxity, Wald’s SPRT, and rank tests. Bickel and Doksum had just come out and hadn’t reached India; we used Ferguson (1967), and we all agreed that it was a fulfilling course.
But that was then, and this is now. Through purely personal interactions, I have had anecdotal evidence of some sentiment that a part of what was long regarded as the core of inference is not considered too relevant now. The events and the discoveries of the last thirty years make it necessary to re-evaluate what is the core theory of statistics that a fresh PhD ought to be expected to know, and understand. To get a sense of my colleagues’ pulse, I checked out the syllabi of the core course at Berkeley, Stanford, Chicago, Washington, UPenn, Carnegie Mellon, and Duke. I also contacted 11 experts in the US, Europe, Australia, and India, and solicited their definition of what should be in the qualifier theory course. The responses quite surprised me.
I was surprised by the fantastic diversity of the opinions on what should be in that first theory course. The intersection of the definitions was empty, barring sufficiency, exponential families, and MLEs. But there was an unmistakable desire to put less emphasis on parametrics, unbiasedness, on UMP tests, on certain parts of decision theory, traditional sequential analysis, and the old nonparametrics. On the other hand, the responses included many “new age” topics: the bootstrap, AIC and BIC, VC theory, empirical processes, permutation tests, EM, MCMC, function estimation, sparsity, causal inference, extreme values, and some more.
The responses are revealing. They told me that while there is a sharp hunger for change in that traditional core course, it is no longer possible to have an approximate global consensus on what that first course should teach. The core course will probably become rather local, and we wouldn’t be able to assume that a fresh PhD from a statistics program has seen and been tested on a set of common topics in inference.
What would I teach personally in that first course? After reading all the responses that I received, I think I personally agree with these two statements from John Marden and Philip Stark: the first math-stat course should be about how to think about models and inference and the mathematical (as opposed to computational) framework to attack the problems; and, you need to have some idea of what’s possible, and where to look for approaches, ideas, inspirations, theorems. I think my own dottrina à nouveau could be something like this, the topics and the number of 50-minute lectures on each (43 in total).
Problems and basic principles of inference, selecting and evaluating a procedure, loss and risk, bias, variance, parametric vs. nonparametric modelling, optimality vs. robustness, Hogg’s adaptive estimate (nontechnical; 2); modelling, location-scale, exponential families, mixtures, heavy tails, non/semi-parametric models, dependence (4); data summary, likelihood function, sufficiency, factorization and Rao-Blackwell, definition of UMVU (3); score function, Fisher information, information matrix, Cramér-Rao (3); MLEs, general and in exponential family, some nonregular, some multiparameter, nonexistence, difficulty of computing, refer to EM in Bickel-Doksum (3); simulate the Cauchy MLE and median, statement of Portmanteau, asymptotic normality and Cramér-Rao conditions, observed information and sandwich, two parameter Gamma, plug-in, delta and Slutsky, applications (5); priors, posteriors, conjugate priors, posterior means, posterior and Bayes risk, comparison with MLE, Bayes vs. minimaxity, from Bickel-Doksum (5); testing, error probabilities and power, NP lemma, applications, statement of UMP one sided tests (3); LRT, three examples of Bickel-Doksum, chi square limit (3); confidence sets, duality with testing, t interval, asymptotic confidence intervals, definition of posterior credible interval (3); the empirical CDF; SLLN, Glivenko-Cantelli and DKW; purpose, use, and scope of bootstrap; bootstrap bias and variance estimation; consistency of bootstrap, permutation tests (4); a personal selection from James-Stein and Donoho-Johnstone estimates, sparsity, SURE, Gaussian sample maximum, kernels, RKHS and function estimation, model choice, AIC, BIC, VC and martingale inequalities, Bayes factors, Dirichlet process, Bernstein-von Mises (5).
Perhaps someone else will address what should be taught in Stat 100. Opinions will probably differ on that, too. But that’s an issue for another day.