Xiao-Li Meng writes:

My sabbatical orientation at Lugano (see the last XL-Files) boosted my over-confidence into double digits. Anyone who asked about my sabbatical plan would get an ambitious answer: that I would complete 14 articles during my sabbatical year. The year is now (at the time of writing) 58.33% over. My accomplishment, you guessed it, is significantly lower—39.29% to be exact. In addition to the usual non-linear path of research progress, what slows me down are the never-ending errors I manage to create. Every morning I promised myself that this would be the day for the final proofreading. Yet I would retire in the evening with another 20–30 red circles on the draft. This happened on Dec 21, Dec 22, and Dec 23, a replay of Groundhog Day, undoubtedly pleasing card-carrying frequentists. I took a deep breath on Christmas Eve, forcing my fingers to plunge into the submission system faster than the rising temptation for yet another final proofreading. Finally, I could have a proofreading-free Christmas day.

Most of my errors are of a writing nature. Spellcheck has saved me thousands of times, but it cannot save me from confusing “a/an” with “the”, or mistaking “crispy” for “crisp”. It’s extremely frustrating as a non-native speaker, as I simply do not possess the kind of this-does-not-sound-right gut feeling. Far more time consuming, however, is seeking an enticing flow for both novice and expert readers. I almost never get the flow right on the first few tries, and sometimes a “final” proofreading compels a major reorganization. It’s always an internal struggle between the impulse to have a fast publication and the desire to make it a well-written, long-lasting article. The mantra “It’s hard to publish, but impossible to unpublish” can be very helpful when conducting this internal dialogue.

Indeed, I wish I had understood this mantra when I was publishing my thesis work. I managed to publish quite a few papers out of my thesis, but at least one of them I wish now I could unpublish. To be sure, it contains no technical error that I am aware of, nor can it have many writing errors—after all, it was published in a top journal. I was proud because it represented the first idea for which I could claim full credit and genuine novelty simultaneously. Before that work, all hypothesis testing procedures with multiply imputed data sets were based on Wald-type test statistics. One day, I just had this cute idea of manipulating complete-data likelihood ratio functions to compute the multiple-imputation likelihood tests almost as effortlessly as the Wald-type tests. I established theoretical validity and demonstrated its satisfactory performance on a real dataset, which apparently convinced the reviewers.

Over the years, the procedure got into a software package, and then inquiries came in. Why did the software produce negative test statistic values, when the reference distribution is an F distribution? I knew the answer. The test was built on an asymptotic equivalence between Wald and likelihood-ratio statistics, and how soon the asymptotics kick in would depend on the parametrization. It thus came as no surprise that it could fail badly with small datasets.

I then asked a wonderful student, Keith Chan, to seek the optimal parametrization. Soon he reported back that the problem was much worse than I realized. The asymptotic equivalence I relied on is guaranteed only under the null hypothesis. But the procedure I proposed uses this equivalence to estimate a key nuisance parameter, the fraction of missing information (FMI). When the null fails, which we typically hope for, the FMI can be so badly estimated that the test may have essentially zero power!

How on earth did I not check for power? A consequence of rushing for publication? Carried away by one cute idea? A sign of research immaturity? All of the above! What depresses me the most is that all the defects of my proposal were automatically fixed by Keith’s “test of average” guided by the likelihood principle. In contrast, my cute idea relies on “average of tests”, guided by a computational trick rather than statistical principles. Computational convenience should always be an important consideration. But when it becomes the driving force, we must keep in mind that computationally convenient bad procedures can do more harm than computationally inconvenient bad (and good) procedures.

Apparently, I had not learned this lesson well when I set my sabbatical goal of completing 14 papers. It should have been to produce at least one paper that will still have positive impact in 140 years. Surely our professional reward systems cannot possibly rely on such long-term qualitative measures. But that is exactly the reason that we need to remind ourselves constantly of the impossibility of unpublishing, to combat the tendency to pursue quantity over quality. Read and revise eight times before submitting.