–Geoff-Hart.com: Editing, Writing, and Translation

—Home —Services —Books —Articles —Resources —Fiction —Contact me —Français

You are here: Articles --> 2025 --> Misleading metrics
Vous êtes ici : Essais --> 2025 --> Misleading metrics

To avoid misleading metrics, understand what you’re measuring and why

by Geoffrey Hart

Previously published as: Hart, G. 2025. To avoid misleading metrics, understand what you’re measuring and why. https://www.worldts.com/english-writing/505/index.html

One virtue of scientific thinking is that it tries to replace subjective assessments with quantifiable measurements that are more objective and easier to compare. This approach has greatly advanced many areas of endeavor, including both human-centered fields such as medicine and more mechanistic fields such as chemistry. This approach has been tremendously beneficial, but can blind us to important things. One of the most important is the incorrect belief that just because you can measure something (or assign a number to it), the number is meaningful.

Note: Measurements or indexes based on measurements (also called indicators), are called metrics, from the Greek word metreo (“to measure or compare”).

Consider the example of a widely used medical index, the body mass index (BMI). This index was designed to determine whether a population of individuals has a generally healthy body weight or is (on average) obese, so that its members need to lose weight. The index is calculated as follows:

BMI = weight (kg) / height (m) squared

You can calculate your own BMI using the online calculator provided by the U.S. National Institutes of Health. But why would you? BMI was not designed to describe an individual’s status; rather, it is a population-level measure and is therefore of questionable value for individuals. Examining the two parts of the indicator reveals the problem. First, relying only on weight, without qualifying the weight’s nature, ignores the factors that contribute to weight. It therefore fails to distinguish a robustly healthy individual, such as my female friend who is a champion powerlifter, and someone who’s clinically obese. That is, the equation treats 1 kg of muscle as identical to 1 kg of adipose tissue (“fat”). Second, height is a linear measure, which is then squared to represent an area measure. Yet bodies are three-dimensional, and height is a poor proxy for that complexity. Squaring the height term exacerbates this problem.

The result is what I call “a garbage metric”—one that is applied outside the purpose for which it was designed or that was poorly designed in the first place. It seems to measure something important, but only indirectly describes the thing being measured. BMI can indeed reveal something important, such as the consequences of poor diet and insufficient activity in a population. But is not, by itself, sufficient to confirm the problem in an individual.

Note: I chose the term “garbage” based on the famous computer science maxim that if your input data is garbage, your output (conclusion) is also likely to be garbage.

Consider a second example, which most of us have experienced during a visit to our doctor to mitigate pain. Doctors generally ask us to describe the pain’s intensity. In North America, this is commonly done using a pain scale from 1 (a minor annoyance) to 10 (the worst pain the patient can imagine). The problem should be clear: if you’ve been fortunate in living a largely pain-free life, the amount of pain you can imagine is limited, whereas if you suffer from chronic severe pain, you can imagine pain that a pain-free individual could never imagine. If you’ve experienced a kidney stone or childbirth, you can imagine far worse pain than someone whose worst pain was caused by a bruise or headache. Moreover, the most common version of this scale lacks definitions for intermediate values, making it impossible to reliably assign intermediate values to a patient; asking the same question about exactly the same amount of pain will result in different numerical values each time. Another garbage metric!

Consider, instead, a pain metric that is based on an effort to determine the impact on the patient. This revised scale could use the same range of values (0 to 10), but might include the following categories:

0 = no pain
1 to 3 = I only noticed the pain because you asked.
4 to 7 = I feel enough pain that I cannot perform a normal activity, such as bending over to touch my toes.
7 to 9 = The pain is so severe that I am having a hard time breathing.
10 = [Pain so intense that the patient couldn’t stop screaming to respond to the question.}

This metric is much more useful because, in addition to clearly defining the meaning of the categories (so that asking the same question repeatedly will obtain approximately the same answer each time), it expresses the pain in terms that are consistent across patients and meaningful because they reveal the pain’s impact. After all, the goal of the question is to learn what the pain means to the patient so you can decide how aggressively to treat it.

Another problem relates to the implied meaning of a metric. When you choose variable names, ensure that they reflect the meaning of the data. For example, many researchers use the conventions of binary logic to divide results into two categories, with 1 = yes or true or present and 0 = no or false or absent. This works well for binary choices (e.g., treated for pain versus not treated) because 0 represents the absence of a value, so any non-zero value must mean the presence of a value. (This would not be true for values that have multiple values between 0 and 1.0, such as correlation coefficients.)

There are potential problems with variable names that contradict the meaning of the variable’s value. If you use a variable named “response strength” with values ranging from 1 to 10, it’s clearer to use 1 (a low number) for a low strength and 10 (a high number) for a high strength. Using 1 = the highest strength and 10 = the lowest strength, an order used in athletic competitions is contradictory because a low score for an athlete (1) is associated with a high value (the highest strength or skill). This kind of mismatch between the variable name and the variable’s value encourages misunderstandings and interpretation errors. In my work as a scientific editor, I often see authors interpret their own data incorrectly because the scale they defined runs in the opposite direction to the meaning of the variable name.

The meaning of a number always depends on its context, and it’s unwise to forget that context when you develop metrics. Think very carefully about what problem you're trying to solve or what message you’re hoping to communicate. The fact that you’re creating a number is, by itself, unimportant. Without understanding the number’s meaning to yourself and your readers, you’re at high risk of creating a garbage metric.