Jeffrey S. Rosenthal, Professor of Statistics, University of Toronto, writes:
It happens to the instructor of every university-level introductory statistics class. You define the mean m, and the variance v. You explain how to estimate the mean from an i.i.d. sample, via
You then have a choice. You can awkwardly explain that this division “will be explained later”. Or you can protest that if n is large then “it doesn’t really matter”. Or you can mystically muse that “if n=1 then the answer should be undefined, not zero”. Or you can launch into a confusing and premature explanation of unbiased estimators, which at that stage will enlighten almost no one. (The situation is so dire that some instructors refuse to teach s2 at all, cf. [1].) Meanwhile, the students would prefer to simply divide by n, corresponding to taking an average value, which everyone understands. So why can’t they?
The usual answer, of course, is that s2 is an unbiased estimator of v, i.e. E(s2)= v. And everyone knows that unbiased estimators are so important that they trump any concerns about simplicity or comprehensibility.
Or do they? In preparing my teaching this year, I started to question this assumption. After all, the true value of an estimator is how accurately it estimates. And the best way to measure the accuracy of an estimate is through the mean squared error (MSE). Now, the MSE is the sum of the bias squared plus the estimator’s variance. If the estimator is unbiased, then the bias term is zero, which is good. But could this come at the expense of increasing the estimator’s variance, and hence increasing its MSE? Perhaps yes!
For a simple example, suppose a true parameter is 8, and our estimator equals either 6 or 10 with probability
If
This example suggests the possibility, at least, that
Fortunately, such calculations have already been done (see e.g. [2]). Indeed, if
where
So, the next time I explain estimating the variance, I am going to divide by
References
[1] D.J. Rumsey (2009), Let’s just eliminate the variance. Journal of Statistics Education 17(3). Available at: www.amstat.org/publications/jse/v17n3/rumsey.html
[2] Wikipedia, Mean squared error: Variance. Retrieved August 26, 2015. Available at: en.wikipedia.org/wiki/Mean_squared_error#Variance. (See also www.probability.ca/varmsecalc).
1 comment on “The Kids are Alright: Divide by n when estimating variance”