One way of getting the wrong end of the stick is The Prosecutor's Fallacy. I had heard of that and even tried to keep it in mind when numbers started getting thrown around in criminal cases. It's the idea that when there are conditional probabilities we sometimes apply them in the wrong place. This may be genuine ignorance, or it may be an attempt to deceive. We see it when referencing tests that have what looks like a low false positive rate, like 1%. A crime has been committed and according to some scanning of test data, for example DNA, the police find a suspect that is a match. A million people were scanned, and the prosecutor might quote some statistic that the test had eliminated 990,000 people, and wasn't it extremely odd that the suspect matched?
But there were 10,000 false positives in the data. Each of them would also match that extremely odd occurrence. It would be like accusing someone of cheating on a lottery because the likelihood of their winning was very small.
Or to bring it to civil suits, it is applying in retrospect how rare it is that so many people should have died in that hospital that week on one particular nurse's shift, or that so many people got cancer while living near a particular landfill. But on each of these, we would need to know about the base rate, just for openers. Perhaps that hospital is world-renowned and gets all the toughest cases for a particularly risky procedure. Perhaps the area in question is very sparsely populated, so that a few extra cases of unknown attribution make the canal look more dangerous. What happened at other places on the canal?
But I had not thought the whole thing through, not quite, that the whole fallacy can be reversed and is indeed the Defense Attorney's fallacy. It is linked in the Prosecutor's Fallacy above. It was just discussed on the studies show with especial reference to the OJ Simpson trial. The defense wanted to show that even if he had beaten his wife, it was extremely unusual for a wife who has been beaten to be murdered in any given year, on the order of 1-in-2500.
But that's not really the number we are looking for, because we already know that she had, in fact, been murdered. What we are looking for is the likelihood that the husband who had beaten her had done it. There is a base rate of 5 out 100,000 women murdered across the population. If we assume that 1 in 2,500 of the beaten women murdered is
correct, then of 100,000 women with abusive husbands, about
99,955 will not be murdered. But of the remaining 45 who are murdered, five will just be murdered
as sort of the background rate of American women being
murdered. Murders that happened for some other reason.
So that means 40 will have been murdered by their husbands. So the correct probability that the husband did it should be nearly 90%.
Well that's different kettle of fish, isn't it? The key is not looking at what the prior likelihood of a woman being murdered is, but at if she is murdered, who did it?
We see something similar in seemingly random digits that suddenly show some amazing pattern, like 30 sixes in a batch of 100. What are the odds, eh? Something fishy must be up. But the real question is "once the hundred digits are up, what are the odds that we can find some interesting pattern, like lots of 6's or hardly any 4's or three separate runs of the sequence 2468?" If you find lots of patterns interesting, then the odds of you finding one is close to 100%
You were right I do like this post.
ReplyDeleteOne of the surest ways to spot faked data is the absence of clusters and patterns. When humans try to create "randomness", they aim to make everything evenly spaced and distributed, which isn't what real data looks like. I noticed in pick-4 lottery numbers, (essentially a random 4 digit number), there's often a duplicated number, which to the human eye, seems odd when it keeps happening. However, of the 10,000 possible numbers with digits from 0-9, 5040 have at least one repeated digit. So, a string of "random" numbers with no repeated digits is actually the screwy result.
ReplyDeleteAnd whatever you define as "interesting" is interesting. Witness the interesting number paradox, https://en.wikipedia.org/wiki/Interesting_number_paradox#:~:text=Paradoxical%20nature,-Attempting%20to%20classify&text=The%20paradox%20is%20alleviated%20if,11630%20on%2012%20June%202009.
Or as Richard Feynman once said:
“You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!”
In his book Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work, G. H. Hardy tells this famous story:
ReplyDeleteHe could remember the idiosyncracies of numbers in an almost uncanny way. It was Littlewood who said every positive integer was one of Ramanujan’s personal friends. I remember once going to see him when he was lying ill at Putney. I had ridden in taxi-cab No. 1729, and remarked that the number seemed to be rather a dull one, and that I hoped it was not an unfavourable omen. “No,” he replied, “it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways.”