There has been a lot written lately about the lack of
replicability of many studies, of things we thought we knew but now have to
hold at arm’s length. The social sciences have long been notorious for this,
but medical claims are also looking fragile.
It is hardly surprising that entities with something to gain or lose,
such as pharmaceutical companies, might bury results that showed nonsignificant
or even contradictory outcomes. Not honest, but not surprising – and pharma is
not necessarily exceptional. Researchers
vying only for honor, or perhaps publication to keep their jobs, do it as well.
We can always blame the journalists for oversimplifying and
raising our expectations unrealistically.
That’s fair. We can also blame researchers for not having enough
courage, objectivity, humility, sense of honor – pick a missing virtue, really
– and that’s fair too. Yes it’s
difficult and I might be worse at it, but it IS their job, no? Yes, it is
problematic that those classic experiments in every freshman psychology book
are not all proving out, undermining theories and further research that was
based on those truthies. But the point
is to get them right. Right?
Yet I do share some of the concerns of the overdefensive
researchers. In the presence of doubt,
it is true that much of the general public will conclude that we know nothing,
and therefore taking Vitamin C for cancer is just as good as all those
fancy-schmancy chemicals the medical
establishment is pushing. Lord knows knocking down the idea that there is
something magical about things that are “natural” is often on my mind.
Similarly, in my own field, sometimes diagnosis is
inaccurate and/or treatment does little good. Yet some things can be shown to
work often, and that has value. More
important, we have things that we know don’t work, and can at least not waste
our time on them.
40% replicability means that 40% did turn out to be
true. We thought it was 90-100% and it
isn’t, but we still do know some things we didn’t a century ago. Real life doesn’t tie up as neatly as it
looks at the science fair. Periodic overthrow of 30% of what we were sure was
true might be the normal course of events.
All this upheaval, chicanery, and accusation might be the only way
forward among human beings. Particularly in the social sciences, where all
theories rather obviously have societal implications, we come up against a
difficulty. Because the possible invisible confounding variables are so many,
all experiments must of necessity show only narrow, tentative results. Ooh,
except everyone who goes into that field wants things that are much
grander. They want large theories of
everything, explaining why boys do what they do, or whether criminals played
particular video games.
9 comments:
I am, at the moment, bothering my head with writing a routine to detect which Unicode encoding, if any, a file is written in. (I will spare you the gory details.) The thing about this task is that it's entirely possible for a given file to look like UTF-16 BE or UTF-16 LE, or in some circumstances UTF-32 BE, or MacRoman, or even (God help us) binary and not Unicode at all. There is almost nothing that will definitively tell us "Yep, this is the following encoding." There is a standard for marking the encoding at the start of the file, which (of course) not all programs use, so we might have a file with or without this mark, depending on what program created it.
In the face of all this uncertainty, there is nothing for it but to do some statistical analysis. Certain characters in a certain pattern indicate (but do not guarantee) a certain encoding. If you have N indicators that it's this one, and M indicators that it's that one instead, what do you do? It's trickier than I'd thought at first because it requires judgment calls.
And this is in programming, which is an artificial world whose rules humans created.
Can you picture Socrates in a public service ad touting the virtues of all-natural hemlock?
Arsenic is natural, too.
Jaed, I feel for ya. The gory truth that all systems engineers know is nothing about computers is logical, because they are designed by humans.
Well, to do anything useful the computer programs have to match reality somewhere along the line, and reality tends to be a bit messy.
So, one person who's been writing about this is me. I'm going to give a minor and a major reason to dissent.
1) 40% replication doesn't mean that 40% of the theories were true. It means that 40% of the experiments replicated one time. They might fail the next time. We can't be sure a single replication that anything important has happened -- that small a figure could be random chance.
2) The real thing that concerns me is epicycles. I mean by this the pre-Copernican theories about the way stars moved in the heavens. The concept was that they somehow attached to spheres, and the rotation of the stars was really a feature of the rotation of these several different concentric spheres. There was a problem, though: some of them exhibited a kind of retrograde motion that couldn't be explained by the general theory.
A number of theories were put forward to resolve these few small problems. Replication, though, was really high -- more than strong enough for navigational purposes, which was the main thing the theories were used to accomplish.
The problem just was that there were no spheres. In fact, there was no concentric nature on which to build spheres. The whole model of secret, hidden things to explain the observations was fundamentally wrong.
That's what I think is probably going on with psychology. It's not that you can't replicate things occasionally. Epicycles could replicate almost every time -- often enough to get ships into port reliably. They'd managed this for hundreds of years, which is an impressive set from the standpoint of replication.
What worries me is that the basic furniture, the analogs to the crystal spheres, are likewise unlikely to prove to be real. What worries me is that the whole structure mirrors epicycles, and yet doesn't attain to anything like the epicyclic accuracy.
FWIW, Russo suggested that epicycles were originally calculational tools of the "compass and straightedge" Greeks, and acquired their nominal physical significance later. Dunno; all records are lost.
Wrt item 1), some of the 60% "failed to replicate" accidentally too, so the results average out to something like the initial 40%. The rate at which experiments fail to replicate depends on the true effect, and also on how variable that effect is. You could imagine a drug that makes an average 1% improvement in quality of life, but with variability so great that almost half the patients wish they'd never set eyes on it.
Is it right that the 60% should 'fail to replicate accidentally' at the same rate as the 40% would give false positives? It seems that if you get the relationship right, it would require a fairly big accident for the replication to fail; whereas if you were wrong about the relationship, a small accident could arrange the correlation if the condition were fairly common anyway.
For example, a study attempting to show that sun shining on ice is correlated with ice sublimating or melting should replicate strongly -- something pretty big has to happen in the environment to stop the replication. On the other hand, a study suggesting that having a headache correlates to having money in your pocket should give false positives fairly often, just because most people often carry money (and the correlation would seem stronger for more common kinds of money than less common kinds). Accidental replication doesn't seem to be likely to occur at the same rate as accidental failure of replication in cases of true connections, but rather at a rate that will float based on independent factors.
Compare the analogy to a TB test, which I was offering the other day. The TB test is conducted by looking for antibodies, which can be found (or not) in the blood, and which are produced by exposure to TB.
Could the antibodies exist in someone not exposed to TB, producing a false positive? Sure. I could also get a false negative if my body has some additional condition that prevents me forming antibodies.
But the relationship is firm enough that we should expect this to be relatively rare; we wouldn't expect false results nearly as often as we would if we were wrongly correlating TB with an unrelated condition.
True, they don't have to be symmetric. My estimate relies on an assumption that strong effects are going to be rare in these studies.
Post a Comment