Marianne Huebner, Michigan State University, has a plan to protect you from having to take these statistical painkillers for your data headaches. Devise your data collection to ensure that your statistics are transparent, reproducible, and trustworthy. She writes:

 

Imagine you contribute to a manuscript reporting an observational study on health outcomes. The analyses are complete, and the tables and figures look great. Now you are preparing a paper. At this stage, “pain points” may emerge. (These will be familiar to statisticians across projects and institutions.) Also, some of these issues may resurface during peer review, when analytic decisions must be clarified or revisited. They are not just frustrations that slow down the process of getting a paper submitted; they are warning signs of lack of transparency and weakened trustworthiness. Fortunately, many of these pain points are preventable through prespecified statistical analysis plans and reproducible workflows. Below are several common examples; this list is not comprehensive.

Example: You aim to report the study following the STROBE guidelines [1]. As part of this, you need to transparently describe how the final analysis sample was constructed. A flow diagram can be used to document how many participants were assessed for eligibility, how many excluded (and why), and what the final analysis sample size was. During the analysis process you may have made changes, for example, dropped some cases because it turned out they should have been excluded for one reason or another, or a variable intended for the modeling was not usable.

Pain point: If these exclusions were not tracked systematically in the analysis script, it can be frustrating to reconstruct the exact sequence of steps and ensure that all numbers add up to the sample size shown in the participant flow diagram; discrepancies will be evident to attentive readers.

Example: The Results section typically begins with a description of the analysis population and a table describing the variables included in the models—often referred to as “Table 1.”

Pain point: If the number of missing values for each variable was not recorded initially, revising an otherwise carefully constructed table can be tiresome.

Example: You may realize that the models were not fit using the full dataset described in Table 1. This can occur when some covariates have missing values or additional exclusion criteria are applied during analysis.

Pain point: Table 1 must be revised to reflect the population used for inference, including updated frequencies, proportions, means, quantiles, or missingness, which is an arduous process. Mismatches between tables and analyses can undermine clarity and credibility.

Example: Modeling introduces further complexity. Suppose several covariates have missing values. Although the proportion missing for any single variable is small and a complete-case analysis seems justified, the combination of variables may result in a much larger proportion of missingness that needs to be addressed in the analysis. If the outcome is an event, then the event size needs to be reported also. Each model may include a different set of variables or be fit to a different subset of the data and therefore requires reporting its own sample size/event size [2].

Pain point: If this information has not been tracked, it must be reconstructed from scripts and output and possibly repeating analyses.

Example: The Discussion section requires interpreting findings and situating them within the existing literature. During this process, you may encounter prior studies that you had not considered earlier—studies that adjusted for an important variable omitted from your analyses, posed related but distinct research questions, or highlighted future research directions you had not anticipated. You suddenly feel uncertain about whether you asked all the right questions and truly mapped the substantive research question to the appropriate statistical question, as David Hand writes on the previous page [in the June/July 2026 IMS Bulletin].

Pain point: If the newly identified variable is deemed essential, or if a newly encountered research question is relevant and can be addressed with your existing data, additional analyses become necessary. This requires fitting new models and constructing new results tables in order to be able to compare to these existing studies.

 

At first glance, these pain points seem like inconveniences: unanticipated detours that delay writing or submission. Addressing these issues is not optional. Anticipating—and preventing—these problems is part of our professional responsibilities and ethical research practice. They reflect a commitment to:

Transparency: Making analytic decisions visible and understandable, rather than reconstructing them after the fact.

Reproducibility: Ensuring that analyses can be repeated or extended—including by your future self.

Trustworthiness: Demonstrating that the analytic process and reported results are credible, any limitations and potential bias are described, and claims are supported by evidence from the data [3].

Practical strategies for minimizing these pain points include developing a statistical analysis plan, conducting careful initial data analysis, systematically documenting exclusions and missingness, and maintaining reproducible workflows [4]. Such practices not only streamline manuscript preparation but also substantially reduce the burden—and risk—associated with responding to reviewer requests for clarification, reanalysis, or expanded reporting. Although some challenges in manuscript preparation or in addressing reviewer comments may be out of your control, these approaches greatly reduce the likelihood that preventable problems surface precisely when a manuscript is ready for submission.

 

References

[1] von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008 Apr;61(4):344-9. doi: 10.1016/j.jclinepi.2007.11.008

[2] Sauerbrei W, Haeussler T, Balmford J, Huebner M. Structured reporting to improve transparency of analyses in prognostic marker studies. BMC Med. 2022 May 12;20(1):184. doi: 10.1186/s12916-022-02304-5.

[3] Nosek BA, Allison DB, Jamieson KH, McNutt M, Nielsen AB, Wolf SM. A framework for assessing the trustworthiness of scientific research findings. Proc Natl Acad Sci U S A. 2026 Feb 10;123(6):e2536736123. doi: 10.1073/pnas.2536736123.

[4] Baillie M, le Cessie S, Schmidt CO, Lusa L, Huebner M; Topic Group “Initial Data Analysis” of the STRATOS Initiative. Ten simple rules for initial data analysis. PLoS Comput Biol. 2022 Feb 24;18(2):e1009819. doi: 10.1371/journal.pcbi.1009819