Contributing Editor Anirban DasGupta writes:

A wonderful thing about tenure is that once I had it, I never had to control my irresistible urge to waste my time on the most useless of all things. The other day, a close friend said to me, “But I was almost right!” I have not the slightest notion why this pedantic remark of a friend made me wonder if our everyday confidence intervals (sets) are almost right even when they are wrong, and squarely right when they are right. At the clear risk of saying things that were all done a long time ago, I want to report a few simple, but perhaps interesting, facts on how right are our confidence sets when they are right, and how wrong are they when they are wrong, and how does the dimension of the problem affect the answers, precisely.

Simplicity has its virtues. So, how about starting with a simple example that we can easily relate to. Take the t interval, say Cn, X¯±tα/2,n1sn for the mean μ of a one dimensional CDF F with a finite variance. Its margin of error is of course δn=tα/2,n1sn. When our t interval misses the true μ, the amount by which it misses, say dn, is the distance of μ from the appropriate endpoint of Cn. Expressed in units of
the margin of error, the amount by which we miss is wn=dnδn; dn and δn both go down at the rate n, and it seemed as though wn is a better index practically, than simply dn. I wanted to understand how large wn is when the t interval fails, for example, what is EF(wn|μCn).

Of course, I did simulate it first. I simulated for seven choices of F,N(0,1), standard double exponential,t3,U[1,1],Beta(1/2,1/2),Poisson(4), and χ42, using in each case a simulation size of 8,000 and α=.05,n=50, a gentle sample size. My simulation averages of wn (conditioned on failure) in the seven cases were .23,.18,.19,.22,.21,.20, and .24. I understood the simulations to mean that the 95%t interval misses μ by about 20% of the margin of error when it misses. But why are the simulation averages all so tantalizingly close to 20% although the distributions simulated are very different? We must then expect that there is a theorem here. It turns out that whenever F has a finite variance, EF(wn|μCn)2ϕ(zα/2)αzα/21=.1927 for α=.05, and this explains why my simulation averages all hovered around .2. We can say more; we have, for w>0,PF(wn>w|μCn)2[1Φ(zα/2+w)]α. I will apply this to predicting a US Presidential election in closing. Higher order expressions for PF(wn>w|μCn) are derivable (in nonlattice cases) by using results in Hall (1987, AOP).

The other side of the coin is how right is the interval when it is right, for example, EF(|X¯μ|δn|μCn). And here, it turned out that this converges to 2[ϕ(0)ϕ(zα/2)](1α)zα/2=.3657 for α=.05; that is, when we succeed, {\it whatever be our F}, the true μ is about 63% deep inside the interval from its boundary. I will let others decide if these two numbers .1927,.3657 are good or bad.

For the extension to higher dimensions, a little more notation is unavoidable. I let F be a CDF in p-space with a covariance matrix Σ, which I treat as known, and as my confidence set I take the usual (Gaussian) ellipsoid centered at the sample mean and oriented by Σ. The known Σ assumption does not affect first order asymptotics in this problem, if p is held fixed. One can write a formula; E(wn|μCn)=2Γ(p+12)αΓ(p2)χα,p2P(χp+12>χα,p2).

Now, the analogous limit result on EF(wn|μCn) needs a bit more work, as one needs to use higher order Stirling approximations to the Gamma function, and Edgeworth expansions for a χ2 statistic, and Cornish-Fisher expansions for a χ2 percentile, and then collect terms. My personal curiosity was about large p, and it turned out that E(wn|μCn)=ϕ(zα)αzα2p+O(p1); so, in units of the margin of error,
the amount by which the ellipsoid misses when it does goes down with the number of dimensions at the rate 1p. The higher the dimension, when we miss, the true μ is more {\it just around the corner}.

I close the circle by returning to one dimension. Take the case of predicting a very close US Presidential election. Stratification and nonresponse aside, we are dealing with a binomial p. If we poll n6765 voters, and use a 90% Wald interval, then the pollster may state that the poll’s margin of error is at most1%, and in case, by misfortune, the poll is wrong, the true p is within at most another half a percentage point with a 90% probability. Very many public polls use only about 1000 voters. If we poll only 1000 voters, we can claim that our margin of error is at most 2.6%, and in case our poll is wrong, the true p is within at most another 1.5% with a 90%

This story remains the same for essentially all {\it LAN} problems. The corresponding Bayesian problems are similar. And now, I must find myself some other completely useless thought to keep me entertained!