Our contributing editor Marianne Huebner teams up with Michael Wallace, University of Waterloo, Ontario, Canada, exploring the question of measurement error in data collections and statistical remedies.

Photo: Measuring the Force…
The organizers arranged a memorable visitor for their grip strength testing
at the World Masters Weightlifting Championships in Orlando, Florida.
I recently had the opportunity to participate in collecting data on grip strength in athletes. At first, I was nervous, since instructing participants on proper measurement technique is not my area of expertise. Fortunately, the study followed a carefully specified protocol. Grip strength was measured using a device known as a dynamometer, and participants were instructed to hold it with their elbow flexed at approximately 90 degrees. There were three repeated trials per hand.
Despite the careful protocol, the experience was still eye-opening. I knew that measurement error—where our measured values differ from what we are trying to measure—is a pervasive problem in most applied settings. My involvement in this study raised a lot of questions in my mind about the challenges measurement error can present, even in simple-seeming settings. For example:
• Grip strength measurements are not standardized across studies. In other settings, participants may be instructed, for instance, to hold the instrument with the arm fully extended, introducing systematic variation that complicates comparisons across datasets.
• The examiner can influence results through differences in instruction and encouragement, for example forgetting to explain about avoiding pressing the arm against the rib cage. Enthusiastic prompts such as, “GO–GO–GO! Squeeze as HARD as you can!” may elicit greater maximal effort than more neutral instructions like, “Squeeze as hard as you can.” Such differences can translate into variability in recorded values.
• Recording measurements can also introduce error. In some datasets, the distribution of recorded grip strength values shows distinct spikes at rounded numbers, reflecting digit preference rather than true underlying variability [1].
• The timing of measurement can be a source of variability. In a competition setting, when assessing grip strength before the event, participants may experience anticipatory stress, or afterward, participants’ fatigue may reduce maximal force.
Different outcome variables are also possible: these could include maximum grip strength from three attempts (one hand), an average of repeated attempts, or asymmetry between left and right hand. It is therefore critical at the outset of any study to ask, “What are we trying to measure?” Once this question has a precise answer, steps should be taken to ensure the measurements taken are as accurate a reflection of this as possible. This includes standardizing protocols, careful training, and precise data recording.
This illustrates a subtle but important distinction: not all variability arises from error in measurement. Some variability reflects genuine fluctuation in the quantity being measured. I asked Michael Wallace, who is an expert on measurement error, about statistical remedies. We discussed that statistical adjustment can address certain forms of measurement error, such as variation in repeated measures of grip strength. However, some sources of error can be much more challenging, or even impossible, to adjust for using statistical methods.
Michael also advises it is important to recognize when precise measurements are simply not possible or practical, due to logistical or cost constraints. In such contexts, study design can still aim to minimize (rather than remove) errors. Sometimes it may be better to take fewer, more precise measurements, than more, less precise ones. It is often desirable to plan for repeated measurements—as we did in the grip strength study—which can provide insights into the scale of the error.
When measurement error is unavoidable, it is important to understand its implications. Michael noted that it can be tempting to analyze one’s data as if they are accurate, and common misconceptions abound around errors ‘averaging out’ or only resulting in weakened effect estimates. However, the impact of measurement error can be severe even in simple settings and will depend on a variety of factors. For example, is the error ‘random’, or biased in one direction? Fatigue, for instance, may drag our measurements of grip strength downwards, which could raise greater concerns than measurements that are—at least on average—accurate. In grip strength measurements, fatigue can be addressed with small breaks between measurements. Regardless of the structure of the error, it must be accounted for during analysis, and there are many statistical tools we can turn to, such as regression calibration or simulation extrapolation.
However, employing statistical techniques to retrospectively account for the consequences of measurement error can be challenging. The choice of methods depends on what data we have available (such as repeated measurements), as well as the size and structure of the measurement error itself. While analyses that fail to account for measurement error will almost always be fundamentally flawed, techniques to retrospectively correct for it must be employed with caution.
We advise that the best solution to measurement error in your data is to avoid it in the first place. By integrating measurement error considerations into your study design, its impact can be limited, if not completely removed. But, if errors persist, there are statistical remedies, as long as they are used with care. For a more detailed—but accessible—introduction to measurement error, please check out Michael’s “Analysis in an imperfect world” [2] or for further information, the STRATOS Initiative’s measurement error website [3 and resources therein].
—
References
[1] Lusa, L., Huebner, M. Organizing and Analyzing Data from the SHARE Study with an Application to Age and Sex Differences in Depressive Symptoms. Int J Environ Res Public Health. 2021 Sep 14; 18(18):9684
[2] Wallace, M. (2020), Analysis in an imperfect world. Significance, 17: 14-19.
[3] Measurement Error Website of the STRATOS Initiative: https://www.stratostg4.statistik.uni-muenchen.de/Home.html