Strategies to address questionable statistical practices.

If you have not yet read all you want to read about the wrongdoing of social psychologist Diederik Stapel, you may be interested in reading the 2012 Tilburg Report (PDF) on the matter. The full title of the English translation is “Flawed science: the fraudulent research practices of social psychologist Diederik Stapel” (in Dutch, “Falende wetenschap: De fruaduleuze onderzoekspraktijken van social-psycholoog Diederik Stapel”), and it’s 104 pages long, which might make it beach reading for the right kind of person.

If you’re not quite up to the whole report, Error Statistics Philosophy has a nice discussion of some of the highlights. In that post, D. G. Mayo writes:

The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It was an accidental byproduct of the investigation of one case (Diederik Stapel, social psychology) that they walked into a culture of “verification bias”. Maybe that’s why I find it so telling. It’s as if they could scarcely believe their ears when people they interviewed “defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences” (Report 48). …

I would place techniques for ‘verification bias’ under the general umbrella of techniques for squelching stringent criticism and repressing severe tests. These gambits make it so easy to find apparent support for one’s pet theory or hypotheses, as to count as no evidence at all (see some from their list ). Any field that regularly proceeds this way I would call a pseudoscience, or non-science, following Popper. “Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory.”

You’d imagine this would raise the stakes pretty significantly for the researcher who could be teetering on the edge of verification bias: fall off that cliff and what you’re doing is no longer worthy of the name scientific knowledge-building.

Psychology, after all, is one of those fields given a hard time by people in “hard sciences,” which are popularly reckoned to be more objective, more revealing of actual structures and mechanisms in the world — more science-y. Fair or not, this might mean that psychologist have something to prove about their hardheadedness as researchers, about the stringency of their methods. Some peer pressure within the field to live up to such standards would obviously be a good thing — and certainly, it would be a better thing for the scientific respectability of psychology than an “everyone is doing it” excuse for less stringent methods.

Plus, isn’t psychology a field whose practitioners should have a grip on the various cognitive biases to which we humans fall prey? Shouldn’t psychologists understand better than most the wisdom of putting structures in place (whether embodied in methodology or in social interactions) to counteract those cognitive biases?

Remember that part of Stapel’s M.O. was keeping current with the social psychology literature so he could formulate hypotheses that fit very comfortably with researchers’ expectations of how the phenomena they studied behaved. Then, fabricating the expected results for his “investigations” of these hypotheses, Stapel caught peer reviewers being credulous rather than appropriately skeptical.

Short of trying to reproduce the experiments Stapel described themselves, how could peer reviewers avoid being fooled? Mayo has a suggestion:

Rather than report on believability, researchers need to report the properties of the methods they used: What was their capacity to have identified, avoided, admitted verification bias? The role of probability here would not be to quantify the degree of confidence or believability in a hypothesis, given the background theory or most intuitively plausible paradigms, but rather to check how severely probed or well-tested a hypothesis is– whether the assessment is formal, quasi-formal or informal. Was a good job done in scrutinizing flaws…or a terrible one? Or was there just a bit of data massaging and cherry picking to support the desired conclusion? As a matter of routine, researchers should tell us.

I’m no social psychologist, but this strikes me as a good concrete step that could help peer reviewers make better evaluations — and that should help scientists who don’t want to fool themselves (let alone their scientific peers) to be clearer about what they really know and how well they really know it.

Leave a Reply Cancel reply