Mike McCormick




The Data Detective

author: Tim Harford

date completed: 2021-12-29

--Beware your feelings and what they bias you to believe. We believe what we want to believe and are prone to selection bias of facts that support our view while ignoring points that don't. First, examine your feelings and what they make you inclined to believe; ask yourself "how does this make me feel?"

--Weigh stories from two viewpoints - personal experience vs statistical summary. Be wary of Goodhart's law: "A statistical metric may be a pretty decent proxy for something that really matters, but it is almost always a proxy rather than the real thing. Once you start using that proxy as a target to be improved, or a metric to control others at a distance, it will be distorted, faked, or undermined. The value of the measure will evaporate."

--Avoid premature enumeration - clarify what is being measured and how; what is the definition bring used? Start by asking what the claim actually means, in context! Eg rise in death rates really due to change in classification, not actual death rate change.

--When presented with statistics, stop and think about context first. What do I feel about the numbers, why are they being presented, what is being presented, how was it counted, by what definition?

--Step back and look for data to put the statistics in context. Don't think about it in isolation; what are the trends, what can it be compared to, what can give you a sense of scale.

--Get the backstory: where was the data from and how was it collected? What was the sample size? How large was the effect? Was it statistically significant? Beware the replication crisis; don't rely on one study - what was a preponderance of the literature, covering multiple methods and results, suggest?

--Ask who is missing, and how that may bias results. Was the study only conducted on WEIRD subjects (Western, Educated, and from Industrialized Rich Democracies)? Male vs female, young vs old? Was it a random, representative sample or was there sampling bias?

--Don't confuse the right data with data that is easier to collect - introduces sample bias!

--Beware using correlation for prediction. If you don't know what causes the correlation, you wont know what causes its' predictive power to breakdown. Eg google flu trends correlated flu with winter; it totally missed summer flu outbreak. Demand transparency in models IOT understand how they can break down and be exploited.

--Computers and algorithms aren't racist, sexist, etc. But if they are trained on our own historical biases they can be. Ask which algorithms can we trust, and what can we trust them with?

--Don't take statistical bedrock for granted. Statisticians can be pressured to change or omit the stats.

--Never let zippy design distract you from the fact that the underlying numbers may be wrong.

--Without well-defined standards for statistical record-keeping, nothing adds up. Numbers can easily confuse us when unmoored from a clear definition.

--Bottom line: be curious and question everything!