During the Vietnam War, enemy body counts were taken to be a precise and objective measure of success. The rationale was that by increasing estimated enemy deaths and minimizing one's own, victory would be assured.
But what Robert McNamara — the US Secretary of Defense at that time — did not realise is that guerrilla warfare, as was followed by that time by the Vietnamese, and widespread resistance could inevitably lead to massive inaccuracies in the estimates of enemy casualties.
What McNamara also did during that time was lower the standards of admission to the military, thus increasing the count of people getting enlisted. His idea was that one man is more or less equal to another as a soldier, and that the right training and superior equipment would be the major causes of victory, not the abilities of the individuals themselves.
And we all know how the Vietnam War panned out for the USA. (quite badly!)
McNamara was a man who reduced war to a mathematical model. He was prone to making decisions based solely on quantitative observations or metrics and ignoring all others.
The reason he often used to give is that such observations could not be proven, and were thus invalid.
This today is known as the McNamara Fallacy.
"The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide."
— Daniel Yankelovich
Renowned architect Christopher Alexander in his book ‘Notes on The Synthesis of Form’, also conveys a similar sentiment.
“The existence of a performance standard, and the association of a numerical scale with a misfit variable, does not mean that the misfit is any more keenly felt in the ensemble when it occurs. There are of course many, many misfits for which we do not have such a scale. Some typical examples are "boredom in an exhibition," "comfort for a kettle handle," "security for a fastener or a lock," "human warmth in a living room," "lack of variety in a park." No one has yet invented a scale for unhappiness or discomfort or uneasiness, and it is therefore not possible to set up performance standards for them. Yet these misfits are among the most critical which occur in design problems. The importance of these nonquantifiable variables is sometimes lost in the effort to be ‘scientific.’”
Many a time, as people building products, we over-rely on the metrics our tools measure for us, often thinking they’re the only things worth measuring.
But it’s good to remember every once in a while that the world is analogue — with infinite complexity and gradation — while the metrics we create are digital, and will inevitably miss out on the real causes of observed user behaviour.
"The thing I have noticed is when the anecdotes and the data disagree, the anecdotes are usually right. There's something wrong with the way you are measuring it.”
— Jeff Bezos
Then there is also the problem with Goodhart’s Law, which states that whenever a measure becomes a goal to optimize, it stops being a good measure. Simply because people’s incentives are now aligned with optimizing that metric; their performance as an employee will be based on how that metric looks on the dashboard. And hence, they will try to game that metric, even if the things they implement to do so actually hamper the user experience.
I’m reminded of this tweet by Rory Sutherland where he found out why thread count is not such a good measure for evaluating fabric quality:
Thread count as a metric got gamed by the market.
“Whatever way you come to "measure" in order to verify some supposed positive externality, will become gamed in a way that is cheaper and doesn't produce said positive externality. All metrics that are targets get gamed.”
— Joe Norman, Complexity Scientist
Also, there is this tendency of using metrics to confirm our biases rather than using them to challenge our biases.
Wittgenstein’s Ruler: If you have no confidence in the ruler, or if the ruler is ill-designed, what it will measure and reveal is not the phenomenon but your own biases.
Replace ‘ruler’ with any metric you love and you’ll get my point.
Hence, it is always a good idea to have multiple ways to measure the same thing. You know, flight pilots in the 20th century used to do that.
They had two gas gauges in the cockpit of their plane. And they still used to follow a method called “sticking” which meant plumbing a wooden dowel through the top of the wing, into the gas tank, and judging the gas level by the height of the resulting wetness.
Why? The needles on those gas gauges often got stuck to the glass, and would report faulty gas levels!
The lesson: Trust, but verify.
But you’ll often find that people with skin in the game naturally tend to do this due to their self-preservation instinct.
Skepticism in what you’re measuring helps, or the plane crashes and the market eats you up.