Xiao-Li Meng finds it harder to escape statistics than he thought…

The 2014 AAAS (American Association for the Advancement of Science) Annual Meeting (held February 13–17) has given me a new meaning to “Valentine’s Escape”. February is always jam-packed with most of the 57 graduate program admissions meetings, many starting at 8 a.m. I was therefore longing for a “morning escape”, one where I could start my morning haphazardly (not randomly) without any haphazard consequences. Furthermore, for my continuing general education (see May 2013 XL-Files), I also felt I needed an intellectual escape from statistics.

The AAAS meeting therefore seemed perfect, but my actual experience was anything but an escape. Cruelly, enticing Scientific Symposia sessions such as “The Physics of Information” started right at 8 a.m.! Worse, unlike admissions meetings, no one would wait for me (or give a hoot if I was there or not). Nor could I afford to be late, as I could for a statistical session, for which I usually can reasonably impute the missed content from an abstract. The end result was that I even lost those weekend “morning escapes” I normally would have had!

So, did I have better luck escaping from statistics, since I deliberately avoided any session with big (signs of) data? Well, you be the judge. The symposia on “The Physics of Information” was really about quantum physics, quantum computation, and quantum cryptography. The fact that the only phrase involving quantum that I had some understanding of is “quantum leap” did not stop me from wondering whether the quantum phenomenon implies that the world we live in is fundamentally stochastic; and that all the ignorability assumptions about data collections are always approximate, with differences only being to what degree. I wondered whether for statisticians an effective way to appreciate the uncertainty principle, which says the product of the variance of the position and the variance of the momentum is bounded below by a positive constant, is to express the Cramér–Rao lower bound (CRLB) in an analogous form. That is, the CRLB says that the product of the variance of an unbiased estimator and the variance of the score function must be bounded below by 1. (Of course, the deeper connections between uncertainty principle and CRLB require more space than the margin of any XL-File; but see http://www-isl.stanford.edu/~cover/papers/dembo_cover_thomas_91.pdf)

Admittedly this was a self-imposed trap—how could any statistician expect to escape from a healthy dose of statistics by entering the quantum world? So what about a session on effective communication between scientific and religious communities? At least there I shouldn’t expect myself to raise any serious issue of a statistical nature, right? Wrong again, as the session was largely about a major survey on the perceptions these communities have about each other. I was deeply trapped again the moment the speaker mentioned that only those who responded to all questions were included in the analysis. As this was a session on effective communication, I was rather proud of myself when I framed my question as, “Given it is well-known that those who have stronger opinions are more likely to respond, could we interpret that your findings reflect more of those who have stronger perceptions?”

My intention should be obvious. Instead of criticizing the non-response bias, I offered the speaker the option of redefining the target population so the complete-case analysis is relevant. The redefinition here is more than a post-analysis face-saving strategy, because for matters such as the impact of perceptions on policies, those who have stronger opinions can and do matter more, and therefore focusing on that sub-population is not a useless exercise. I therefore expected the speaker to gladly take my suggestion, or at least to acknowledge it as a possible interpretation. But a good intention is not always well received. The speaker’s answer was a quick “No,” because, “It is a well accepted practice in sociology to only analyze the complete answers, and the response rate is large enough.” (About 60% of cases were reported to be complete.)

The first reason would drive most statisticians up the wall. Indeed, I had to repeatedly remind myself that I was on “escape” in order to prevent myself from accidentally insulting an entire discipline (“Just because everyone does it does not make it right,” was on the tip of my tongue). The second reason is even more dangerous because using sample/response fraction to control bias is far more difficult than using sample size to control variance. For example, a biased sampling mechanism can easily induce a larger mean-squared error than a simple random sample of size 500 can, even if it records 95% the population (see, e.g. http://www.stat.harvard.edu/Faculty_Content/meng/COPSS_50.pdf).

Retrospectively I laughed at myself for trying to escape from statistics. The only way I can forgive myself for this dumb attempt is to perform a statistical self-flagellation, that is, to repeatedly submit papers to The Annals of Statistics (until acceptance)!