Dietrich Stoyan and Sung Nok Chiu write:
The question of the origin of COVID-19 is of great importance to humankind, as it seeks to understand how this pandemic emerged and how to prevent future pandemics caused by similar viruses. Currently, there are two competing theories, the natural origin (zoonosis) and the lab-leak theory. It was in July 2022 a sensation when a paper was published in Science, saying that the zoonosis hypothesis is true and pinpointing the Huanan Seafood Wholesale Market in Wuhan as the origin. This paper, Worobey et al. (2022)—henceforth referred to as W after its first author—garnered hundreds of thousands of downloads and received global media coverage.
We became aware of their work, in a preprint form, back in March 2022. After identifying its weakness, we prepared a paper to criticize it in August 2022, shortly after its publication in Science. We attempted in vain to engage in discussion with the authors of W. We then learned that numerous serious researchers shared doubts about the paper from scientists’ perspectives. However, most of the criticism focused on the poor quality of the data used, which were the residential addresses of the early December cases. In contrast, our concerns primarily revolved around their flawed statistical methods. Subsequently we submitted our critique to JRSS A in September 2022. Following an extensive review process, our paper, Stoyan and Chiu (2024), was published in January 2024.
To find the main flaw of W, turn to page 2 of W, where you would find: “We also investigated whether the December COVID-19 cases were closer to the market than expected based on an empirical null distribution of Wuhan’s population density”, followed by some very small p-values. Unfortunately, there is no precise wording about the null hypothesis and the test procedure. Interested readers would need to dig deeper by looking into the Supplementary Materials, where it says that “null distributions were generated from the population density data […]. For each point in each pseudoreplicate the distance to Huanan was calculated, and the median […] distance to Huanan was calculated for each pseudoreplicate. The median […] distance between all the early December cases (N = 155) […] [was] compared to these null distributions.” We interpret these sentences as follows. A point pattern of 155 independent points was simulated according to Wuhan’s population density. The median of the distances from these 155 points to the Seafood Market (and not to the center of the new sample) was calculated. This procedure was done 1000 times, and the empirical distribution of these medians would be W’s null distribution, from which the p-value of their test could be determined.
The null distribution employed by W is weird. Each simulated pattern of 155 independent points has its points scattered across the entire area of Wuhan city, which is larger than New York City in both size and population. In contrast, the point pattern of the early cases forms a close cluster. Without any computer work it is clear that only in extremely rare cases the distance-median of a simulated sample is smaller than that of the original points.
W also considered the “center-point” of the point cloud of the early cases. Our investigations reveal that, based on their distance-based arguments, it is nearly impossible to exclusively identify only the Seafood Market as the geometrical center, while excluding other places in its proximity, such as the Hankou Railway Station, the Wuhan CDC, and the Wanda Plaza, since the coordinates of the 155 points were imprecise to some extent.
We are pleased to have published our critique. However, we are astonished that no other colleagues have reported similar findings. This situation raises doubts about the current state of the system of modern science and the general understanding of basic principles of statistics in modern society, not to speak about the reviewing process of Science.
Upon receiving media inquiries from outlets such as Newsweek and Frankfurter Allgemeine Sonntagszeitung, the corresponding author of W, without providing any scientific arguments, described our paper as having “tunnel vision” and being “schockierend fehlerhaft” (shockingly flawed), further commenting that we, the authors, “miss the big scientific picture”. However, it is crucial to emphasize that scientifically, neither our statistical analysis nor that of W could reject or support either the natural origin or lab-leak theory.
References:
Dietrich Stoyan, Sung Nok Chiu, “Statistics did not prove that the Huanan Seafood Wholesale Market was the early epicentre of the COVID-19 pandemic”, Journal of the Royal Statistical Society Series A: Statistics in Society, 2024, qnad139, DOI: 10.1093/jrsssa/qnad139
Michael Worobey et al., “The Huanan Seafood Wholesale Market in Wuhan was the early epicenter of the COVID-19 pandemic”, Science, 377,951-959, 2022, DOI: 10.1126/science.abp8715