What kind of problem is it when data do not support findings?

And, whose problem is it?

Yesterday, The Boston Globe published an article about Harvard University psychologist Marc Hauser, a researcher embarking on a leave from his appointment in the wake of a retraction and a finding of scientific misconduct in his lab. From the article:

In a letter Hauser wrote this year to some Harvard colleagues, he described the inquiry as painful. The letter, which was shown to the Globe, said that his lab has been under investigation for three years by a Harvard committee, and that evidence of misconduct was found. He alluded to unspecified mistakes and oversights that he had made, and said he will be on leave for the upcoming academic year. …

Much remains unclear, including why the investigation took so long, the specifics of the misconduct, and whether Hauser’s leave is a punishment for his actions.

The retraction, submitted by Hauser and two co-authors, is to be published in a future issue of Cognition, according to the editor. It says that, “An internal examination at Harvard University . . . found that the data do not support the reported findings. We therefore are retracting this article.’’

The paper tested cotton-top tamarin monkeys’ ability to learn generalized patterns, an ability that human infants had been found to have, and that may be critical for learning language. The paper found that the monkeys were able to learn patterns, suggesting that this was not the critical cognitive building block that explains humans’ ability to learn language. In doing such experiments, researchers videotape the animals to analyze each trial and provide a record of their raw data. …

The editor of Cognition, Gerry Altmann, said in an interview that he had not been told what specific errors had been made in the paper, which is unusual. “Generally when a manuscript is withdrawn, in my experience at any rate, we know a little more background than is actually published in the retraction,’’ he said. “The data not supporting the findings is ambiguous.’’

Gary Marcus, a psychology professor at New York University and one of the co-authors of the paper, said he drafted the introduction and conclusions of the paper, based on data that Hauser collected and analyzed.

“Professor Hauser alerted me that he was concerned about the nature of the data, and suggested that there were problems with the videotape record of the study,’’ Marcus wrote in an e-mail. “I never actually saw the raw data, just his summaries, so I can’t speak to the exact nature of what went wrong.’’
The investigation also raised questions about two other papers co-authored by Hauser. The journal Proceedings of the Royal Society B published a correction last month to a 2007 study. The correction, published after the British journal was notified of the Harvard investigation, said video records and field notes of one of the co-authors were incomplete. Hauser and a colleague redid the three main experiments and the new findings were the same as in the original paper. …

“This retraction creates a quandary for those of us in the field about whether other results are to be trusted as well, especially since there are other papers currently being reconsidered by other journals as well,’’ Michael Tomasello, co-director of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, said in an e-mail. “If scientists can’t trust published papers, the whole process breaks down.’’ …

In 1995, he [Hauser] was the lead author of a paper in the Proceedings of the National Academy of Sciences that looked at whether cotton-top tamarins are able to recognize themselves in a mirror. Self-recognition was something that set humans and other primates, such as chimpanzees and orangutans, apart from other animals, and no one had shown that monkeys had this ability.

Gordon G. Gallup Jr., a professor of psychology at State University of New York at Albany, questioned the results and requested videotapes that Hauser had made of the experiment.

“When I played the videotapes, there was not a thread of compelling evidence — scientific or otherwise — that any of the tamarins had learned to correctly decipher mirrored information about themselves,’’ Gallup said in an interview.

A quick rundown of what we get from this article:

  • Someone raised a concern about scientific misconduct that led to the Harvard inquiry, which in turn led to the discovery of “evidence of misconduct” in Hauser’s lab.
  • We don’t, however, have an identification of what kind of misconduct is suggested by the evidence (fabrication? falsification? plagiarism? other serious deviations from accepted practices?) or of who exactly committed it (Hauser or one of the other people in his lab).
  • At least one paper has been retracted because “the data do not support the reported findings”.
  • However, we don’t know the precise issue with the data here — e.g., whether the reported findings were bolstered by reported data that turned out to be fabricated or falsified (and are thus not being included anymore in “the data”).
  • Apparently, the editor of the journal that published the retracted paper doesn’t know the precise issue with the data, either, and found this unusual enough a situation with respect to the retraction of the paper to merit comment.
  • Other papers from the Hauser group may be under investigation for similar reasons at this point, and other researchers in the field seem to be nervous about those papers and their reliability in light of the ongoing inquiry and the retraction of the paper in Cognition.

There’s already been lots of good commentary on what might be going on with the Hauser case. (I say “might” because there are many facts still not in evidence to those of us not actually on the Harvard inquiry panel. As such, I think it’s necessary to refrain from drawing conclusions not supported by the facts that are in evidence.)

John Hawks situates the Hauser case in terms of the problem of subjective data.

Melody has a nice discussion of the political context of getting research submitted to journals, approved by peer reviewers, and anointed as knowledge.

David Dobbs wonders whether the effects of the Hauser case (and of the publicity it’s getting) will mean backing off from overly strong conclusions drawn from subjective data, or backing off too far from a “hot” scientific field that may still have a bead on some important phenomena in our world.

Drugmonkey critiques the Boston Globe reporting and reminds us that failure to replicate a finding is not evidence of scientific misconduct or fraud. That’s a hugely important point, and one that bears repeating. Repeatedly.

This is the kind of territory where we start to notice common misunderstandings about how science works. It’s usually not the case that we can cut nature at the joints along nicely dotted lines that indicate just where those cuts should be. Collecting reliable data and objectively interpreting that data is hard work. Sometimes as we go, we learn more about better conditions for collecting reliable data, or better procedures for interpreting the data without letting our cognitive biases do the driving. And sometimes, a data set we took to be reliable and representative of the phenomenon we’re trying to understand just isn’t.

That’s part of why scientific conclusions are always tentative. Scientists expect to update their current conclusions in the light of new results down the road — and in the light of our awareness that some of our old results just weren’t as solid or reproducible as we took them to be. It’s good to be sure they’re reproducible enough before you announce a finding to your scientific peers, but to be absolutely certain of total reproducibility, you have to solve the problem of induction, which isn’t terribly practical.

Honest scientific work can lead to incorrect conclusions, either because that honest work yielded wonk data from which to draw conclusions, or because good data can still be consistent with incorrect conclusions.

And, there’s a similar kind of disconnect we should watch out for. For the “corrected” 2007 paper in Proceedings of the Royal Society B, the Boston Globe article reports that videotapes and field notes (the sources of the data to support the reported conclusions) were “incomplete”. But, Hauser and a colleague redid the experiments and found data that supported the conclusions reported in this paper. One might think that as long as reported results are reproducible, they’re necessarily sufficiently ethical and scientifically sound and all that good stuff. That’s not how scientific knowledge-building works. The rules of the game are that you lay your data-cards on the table and base your findings on those data. Chancing upon an answer that turns out to be right but isn’t supported by the data you actually have doesn’t count, nor does having a really strong hunch that turns out to be right. In the scientific realm, empirical data is our basis for knowing what we know about the phenomena. Thus, doing the experiments over in the face of insufficient data is not “playing it safe” so much as “doing the job you were supposed to have done in the first place”.

Now, given the relative paucity of facts in this particular case, I find myself interested by a more general question: What are the ethical duties of a PI who discovers that he has published a paper whose findings are not, in fact, supported by the data?.

It seems reasonable that at least one of his or her duties involves correcting the scientific literature.

This could involve retracting the paper, in essence saying, “Actually, we can’t conclude this based on the data we have. Our bad!”

It could also involve correcting the paper, saying, “We couldn’t conclude this based on the data we have; instead, we should conclude this other thing,” or, “We couldn’t conclude this based on the data we originally reported, but we’ve gone and done more experiments (or have repeated the experiments we described), obtained this data, and are now confident that on the basis of these data, the conclusion in well-supported.”

If faulty data were reported, I would think that the retraction or correction should probably explain how the data were faulty — what’s wrong with them? If the problem had its source in an honest mistake, it might also be valuable to identify that honest mistake so other researchers could avoid it themselves. (Surely this would be a kindness; is it also a duty?)

Beyond correcting the scientific literature, does the PI in this situation have other relevant duties?

Would these involve ratcheting up the scrutiny of data within the lab group in advance of future papers submitted for publication? Taking the skepticism of other researchers in the field more seriously and working that much harder to build a compelling case for conclusions from the data? (Or, perhaps, working hard to identify the ways that the data might argue against the expected conclusion?) Making serious efforts to eliminate as much subjectivity from the data as possible?

Assuming the PI hasn’t fabricated or falsified the data (and that if someone in the lab group has, that person has been benched, at least for the foreseeable future), what kind of steps ought that PI to take to make things right — not just for the particular problematic paper(s), but for his or her whole research group moving forward and interacting with other researchers in the field? How can they earn back trust?

facebooktwittergoogle_pluslinkedinmail
Posted in [Humanities&Social Science], Current events, Ethical research, Methodology, Misconduct, Reader participation, Tribe of Science.

7 Comments

  1. “In the scientific realm, empirical data is our basis for knowing what we know about the phenomena. ”

    Actually, no, it is not.

    Our basis for knowing is the combination of our data and our theories. The idea that all you have to do is put the data cards on the table has been disproven repeatedly in the history of science.

    What is important is to state accurately what the data you do have actually support, and what conclusion you draw from them. The idea that all that takes is to “lay your data-cards on the table and base your findings on those data” misses out on a lot of the scientific process.

    And, assuming that there hasn’t been any fraud, falsification, etc. going on, the proper response to discovering that Experiment 2 doesn’t replicate the results of Experiment 1 is not to retract Paper 1. It might be to write another paper, call it Paper 2, that presents both Experiments (and maybe some other knowledge gained between the two experiments) and draws conclusions from them. Maybe conclusions the opposite of Paper 1. That’s OK. That’s science. That’s ethical.

    Please, in discussing these issues, it’s important to get away from a simplistic version of science that thinks it’s all data, all the time. It is not.

    • @ ecologist: You’re right, the empirical data are giving us “knowledge” only in conjunction with our background theories (and the vetting of both those theories and our data by the other people in our epistemic community, etc.). The point I was trying to make was that, while the data may not be sufficient for us to draw correct conclusions, they are surely necessary if we are to offer conclusions that other scientists would take to be properly supported.

      (Of course, I totally overlooked here the sorts of knowledge that theoreticians build, which may be constrained by empirical data but do not depend on data in the same way. That’s a topic for another post.)

      I agree, as well, that finding new information that contradicts old conclusions, or coming to a new (and presumably better) understanding of what was happening in the experiments reported in your old paper does not normally require a retraction. However, finding a significant flaw in the reported data that bolstered the old conclusions might be (even without any ethical shenanigans).

  2. > Beyond correcting the scientific literature, does the PI in this situation
    > have other relevant duties?

    Yes. Disclosure.

    What actually happened? Academics are loathe to publicly accuse someone of malfeasance without pretty solid evidence, so I can understand everyone’s desire to just say, “Something broke, disregard this finding.”

    However, as you point out, if the mistake was an honest accident, talking about it might prevent someone else from repeating the accident. If the mistake was a case of confirmation bias on the part of a researcher, it’s *still* something that should be disclosed, not to place a scarlet letter on the researcher in question but to remind everyone how to compensate/control for confirmation bias (admittedly, blackballing of some sort may be a practical outcome).

    And the credibility of the lab is going to suffer, justifiably or not. Which is too bad, but part of the practical process of science and the limitations of funding means that we can’t necessarily check every result; if you’re an academic you have some credibility capital above and beyond a layperson. If you’ve screwed the pooch this bad, greater scrutiny of your future work, particularly for findings of great import, is important.

  3. Thanks for the PING. I particularly enjoyed the paragraph about the misinterpretation of the Royal Society B paper (and what incomplete data points to).

    I posted this as a comment to my blog, but I think it bears mention again here, because this story about the data being “incomplete” is being repeated again and again, in comments and posts, as if that somehow exonerates the Cognition retraction, and as if all this amounts to is Hauser being a piss-poor record keeper.

    The APA guideline for maintaining physical records is three years. Yet Hauser is retracting a paper from 2002. That makes the paper itself over seven years old, and the data likely much older still. Unless Harvard administrators have overnight turned fascist, it seems highly unlikely that this is just to do with Hauser’s record keeping. It seems more like a polite way of saying “there wasn’t much data there to begin with.”

  4. The most troubling fact reported in this piece is Gallup’s claim that Hauser’s tapes provided “not a thread of compelling evidence — scientific or otherwise — that any of the tamarins had learned to correctly decipher mirrored information about themselves.” Hauser apparently does not dispute this at least to the extent that he tried and failed to reproduce his 1995 findings in 2000 (http://www.wjh.harvard.edu/%7Emnkylab/publications/learnconcepts/mirror.pdf).

    That reported fact of the past have turned out to be nonsense suggests that we ought to be suspicious of Hauser’s work. We add to this fact that his more recent work is being called into question and it is a rather ugly picture.

    What should the PI do? Retirement might not be a bad idea.

    Timothy E. Kennelly

  5. Pingback: Harvard Psych department may have a job opening. | Adventures in Ethics and Science

Leave a Reply

Your email address will not be published. Required fields are marked *