What kind of problem is it when data do not support findings?

And, whose problem is it?

Yesterday, The Boston Globe published an article about Harvard University psychologist Marc Hauser, a researcher embarking on a leave from his appointment in the wake of a retraction and a finding of scientific misconduct in his lab. From the article:

In a letter Hauser wrote this year to some Harvard colleagues, he described the inquiry as painful. The letter, which was shown to the Globe, said that his lab has been under investigation for three years by a Harvard committee, and that evidence of misconduct was found. He alluded to unspecified mistakes and oversights that he had made, and said he will be on leave for the upcoming academic year. …

Much remains unclear, including why the investigation took so long, the specifics of the misconduct, and whether Hauser’s leave is a punishment for his actions.

The retraction, submitted by Hauser and two co-authors, is to be published in a future issue of Cognition, according to the editor. It says that, “An internal examination at Harvard University . . . found that the data do not support the reported findings. We therefore are retracting this article.’’

The paper tested cotton-top tamarin monkeys’ ability to learn generalized patterns, an ability that human infants had been found to have, and that may be critical for learning language. The paper found that the monkeys were able to learn patterns, suggesting that this was not the critical cognitive building block that explains humans’ ability to learn language. In doing such experiments, researchers videotape the animals to analyze each trial and provide a record of their raw data. …

The editor of Cognition, Gerry Altmann, said in an interview that he had not been told what specific errors had been made in the paper, which is unusual. “Generally when a manuscript is withdrawn, in my experience at any rate, we know a little more background than is actually published in the retraction,’’ he said. “The data not supporting the findings is ambiguous.’’

Gary Marcus, a psychology professor at New York University and one of the co-authors of the paper, said he drafted the introduction and conclusions of the paper, based on data that Hauser collected and analyzed.

“Professor Hauser alerted me that he was concerned about the nature of the data, and suggested that there were problems with the videotape record of the study,’’ Marcus wrote in an e-mail. “I never actually saw the raw data, just his summaries, so I can’t speak to the exact nature of what went wrong.’’
The investigation also raised questions about two other papers co-authored by Hauser. The journal Proceedings of the Royal Society B published a correction last month to a 2007 study. The correction, published after the British journal was notified of the Harvard investigation, said video records and field notes of one of the co-authors were incomplete. Hauser and a colleague redid the three main experiments and the new findings were the same as in the original paper. …

“This retraction creates a quandary for those of us in the field about whether other results are to be trusted as well, especially since there are other papers currently being reconsidered by other journals as well,’’ Michael Tomasello, co-director of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, said in an e-mail. “If scientists can’t trust published papers, the whole process breaks down.’’ …

In 1995, he [Hauser] was the lead author of a paper in the Proceedings of the National Academy of Sciences that looked at whether cotton-top tamarins are able to recognize themselves in a mirror. Self-recognition was something that set humans and other primates, such as chimpanzees and orangutans, apart from other animals, and no one had shown that monkeys had this ability.

Gordon G. Gallup Jr., a professor of psychology at State University of New York at Albany, questioned the results and requested videotapes that Hauser had made of the experiment.

“When I played the videotapes, there was not a thread of compelling evidence — scientific or otherwise — that any of the tamarins had learned to correctly decipher mirrored information about themselves,’’ Gallup said in an interview.

A quick rundown of what we get from this article:

  • Someone raised a concern about scientific misconduct that led to the Harvard inquiry, which in turn led to the discovery of “evidence of misconduct” in Hauser’s lab.
  • We don’t, however, have an identification of what kind of misconduct is suggested by the evidence (fabrication? falsification? plagiarism? other serious deviations from accepted practices?) or of who exactly committed it (Hauser or one of the other people in his lab).
  • At least one paper has been retracted because “the data do not support the reported findings”.
  • However, we don’t know the precise issue with the data here — e.g., whether the reported findings were bolstered by reported data that turned out to be fabricated or falsified (and are thus not being included anymore in “the data”).
  • Apparently, the editor of the journal that published the retracted paper doesn’t know the precise issue with the data, either, and found this unusual enough a situation with respect to the retraction of the paper to merit comment.
  • Other papers from the Hauser group may be under investigation for similar reasons at this point, and other researchers in the field seem to be nervous about those papers and their reliability in light of the ongoing inquiry and the retraction of the paper in Cognition.

There’s already been lots of good commentary on what might be going on with the Hauser case. (I say “might” because there are many facts still not in evidence to those of us not actually on the Harvard inquiry panel. As such, I think it’s necessary to refrain from drawing conclusions not supported by the facts that are in evidence.)

John Hawks situates the Hauser case in terms of the problem of subjective data.

Melody has a nice discussion of the political context of getting research submitted to journals, approved by peer reviewers, and anointed as knowledge.

David Dobbs wonders whether the effects of the Hauser case (and of the publicity it’s getting) will mean backing off from overly strong conclusions drawn from subjective data, or backing off too far from a “hot” scientific field that may still have a bead on some important phenomena in our world.

Drugmonkey critiques the Boston Globe reporting and reminds us that failure to replicate a finding is not evidence of scientific misconduct or fraud. That’s a hugely important point, and one that bears repeating. Repeatedly.

This is the kind of territory where we start to notice common misunderstandings about how science works. It’s usually not the case that we can cut nature at the joints along nicely dotted lines that indicate just where those cuts should be. Collecting reliable data and objectively interpreting that data is hard work. Sometimes as we go, we learn more about better conditions for collecting reliable data, or better procedures for interpreting the data without letting our cognitive biases do the driving. And sometimes, a data set we took to be reliable and representative of the phenomenon we’re trying to understand just isn’t.

That’s part of why scientific conclusions are always tentative. Scientists expect to update their current conclusions in the light of new results down the road — and in the light of our awareness that some of our old results just weren’t as solid or reproducible as we took them to be. It’s good to be sure they’re reproducible enough before you announce a finding to your scientific peers, but to be absolutely certain of total reproducibility, you have to solve the problem of induction, which isn’t terribly practical.

Honest scientific work can lead to incorrect conclusions, either because that honest work yielded wonk data from which to draw conclusions, or because good data can still be consistent with incorrect conclusions.

And, there’s a similar kind of disconnect we should watch out for. For the “corrected” 2007 paper in Proceedings of the Royal Society B, the Boston Globe article reports that videotapes and field notes (the sources of the data to support the reported conclusions) were “incomplete”. But, Hauser and a colleague redid the experiments and found data that supported the conclusions reported in this paper. One might think that as long as reported results are reproducible, they’re necessarily sufficiently ethical and scientifically sound and all that good stuff. That’s not how scientific knowledge-building works. The rules of the game are that you lay your data-cards on the table and base your findings on those data. Chancing upon an answer that turns out to be right but isn’t supported by the data you actually have doesn’t count, nor does having a really strong hunch that turns out to be right. In the scientific realm, empirical data is our basis for knowing what we know about the phenomena. Thus, doing the experiments over in the face of insufficient data is not “playing it safe” so much as “doing the job you were supposed to have done in the first place”.

Now, given the relative paucity of facts in this particular case, I find myself interested by a more general question: What are the ethical duties of a PI who discovers that he has published a paper whose findings are not, in fact, supported by the data?.

It seems reasonable that at least one of his or her duties involves correcting the scientific literature.

This could involve retracting the paper, in essence saying, “Actually, we can’t conclude this based on the data we have. Our bad!”

It could also involve correcting the paper, saying, “We couldn’t conclude this based on the data we have; instead, we should conclude this other thing,” or, “We couldn’t conclude this based on the data we originally reported, but we’ve gone and done more experiments (or have repeated the experiments we described), obtained this data, and are now confident that on the basis of these data, the conclusion in well-supported.”

If faulty data were reported, I would think that the retraction or correction should probably explain how the data were faulty — what’s wrong with them? If the problem had its source in an honest mistake, it might also be valuable to identify that honest mistake so other researchers could avoid it themselves. (Surely this would be a kindness; is it also a duty?)

Beyond correcting the scientific literature, does the PI in this situation have other relevant duties?

Would these involve ratcheting up the scrutiny of data within the lab group in advance of future papers submitted for publication? Taking the skepticism of other researchers in the field more seriously and working that much harder to build a compelling case for conclusions from the data? (Or, perhaps, working hard to identify the ways that the data might argue against the expected conclusion?) Making serious efforts to eliminate as much subjectivity from the data as possible?

Assuming the PI hasn’t fabricated or falsified the data (and that if someone in the lab group has, that person has been benched, at least for the foreseeable future), what kind of steps ought that PI to take to make things right — not just for the particular problematic paper(s), but for his or her whole research group moving forward and interacting with other researchers in the field? How can they earn back trust?

Research methods and primary literature.

At Uncertain Principles, Chad opines that “research methods” look different on the science-y side of campus than they do for his colleagues in the humanities and social sciences:

When the college revised the general education requirements a few years ago, one of the new courses created had as one of its key goals to teach students the difference between primary and secondary sources. Which, again, left me feeling like it didn’t really fit our program– as far as I’m concerned, the “primary source” in physics is the universe. If you did the experiment yourself, then your data constitute a primary source. Anything you can find in the library is necessarily a secondary source, whether it’s the original research paper, a review article summarizing the findings in some field, or a textbook writing about it years later.

In many cases, students are much better off reading newer textbook descriptions of key results than going all the way back to the “primary source” in the literature. Lots of important results in science were initially presented in a form much different than the fuller modern understanding. Going back to the original research articles often requires deciphering cumbersome and outdated notation, when the same ideas are presented much more clearly in newer textbooks.

That’s not really what they’re looking for in the course in question, though– they don’t want it to be a lab course. But then it doesn’t feel like a “research methods” class at all– while we do occasional literature searches, for the most part that’s accomplished by tracing back direct citations from recent articles. When I think about teaching students “research methods,” I think of things like teaching basic electronics, learning to work an oscilloscope, basic laser safety and operation, and so on. The library is a tiny, tiny part of what I do when I do research, and the vast majority of the literature searching I do these days can be done from my office computer.

I’m going to share some observations which maybe complicate Chad’s “two cultures” framing of research (and of what sorts of research methods one might reasonably impart to undergraduates in a course focused on research methods in a particular discipline).

Continue reading

In search of accepted practices: the final report on the investigation of Michael Mann (part 3).

Here we continue our examination of the final report (PDF) of the Investigatory Committee at Penn State University charged with investigating an allegation of scientific misconduct against Dr. Michael E. Mann made in the wake of the ClimateGate media storm. The specific question before the Investigatory Committee was:

“Did Dr. Michael Mann engage in, or participate in, directly or indirectly, any actions that seriously deviated from accepted practices within the academic community for proposing, conducting, or reporting research or other scholarly activities?”

In the last two posts, we considered the committee’s interviews with Dr. Mann and with Dr. William Easterling, the Dean of the College of Earth and Mineral Sciences at Penn State, and with three climate scientists from other institutions, none of whom had collaborated with Dr. Mann. In this post, we turn to the other sources of information to which the Investigatory Committee turned in its efforts to establish what counts as accepted practices within the academic community (and specifically within the community of climate scientists) for proposing, conducting, or reporting research.

Continue reading

In search of accepted practices: the final report on the investigation of Michael Mann (part 2).

When you’re investigating charges that a scientist has seriously deviated from accepted practices for proposing, conducting, or reporting research, how do you establish what the accepted practices are? In the wake of ClimateGate, this was the task facing the Investigatory Committee at Penn State University investigating the allegation (which the earlier Inquiry Committee deemed worthy of an investigation) that Dr. Michael E. Mann “engage[d] in, or participate[d] in, directly or indirectly, … actions that seriously deviated from accepted practices within the academic community for proposing, conducting, or reporting research or other scholarly activities”.
One strategy you might pursue is asking the members of a relevant scientific or academic community what practices they accept. In the last post, we looked at what the Investigatory Committee learned from its interviews about this question with Dr. Mann himself and with Dr. William Easterling, Dean, College of Earth and Mineral Sciences, The Pennsylvania State University. In this post, we turn to the committee’s interviews with three climate scientists from other institutions, none of whom had collaborated with Dr. Mann, and at least one of whom has been very vocal about his disagreements with Dr. Mann’s scientific conclusions.

Continue reading

In search of accepted practices: the final report on the investigation of Michael Mann (part 1).

Way back in early February, we discussed the findings of the misconduct inquiry against Michael Mann, an inquiry that Penn State University mounted in the wake of “numerous communications (emails, phone calls, and letters) accusing Dr. Michael E. Mann of having engaged in acts that included manipulating data, destroying records and colluding to hamper the progress of scientific discourse around the issue of global warming from approximately 1998″. Those numerous communications, of course, followed upon the well-publicized release of purloined email messages from the Climate Research Unit (CRU) webserver at the University of East Anglia — the storm of controversy known as ClimateGate.
You may recall that the misconduct inquiry, whose report (PDF) is here, looked into four allegations against Dr. Mann and found no credible evidence to support three of them. On the fourth allegation, the inquiry committee was unable to make a definitive finding. Here’s what I wrote about the inquiry committee’s report on this allegation:

[T]he inquiry committee is pointing out that researchers at the university has a duty not to commit fabrication, falsification, or plagiarism, but also a positive duty to behave in such a way that they maintain the public’s trust. The inquiry committee goes on to highlight specific sections of policy AD-47 that speak to cultivating intellectual honesty, being scrupulous in presentation of one’s data (and careful not to read those data as being more robust than they really are), showing due respect for their colleagues in the community of scholars even when they disagree with their findings or judgments, and being clear in their communications with the public about when they are speaking in their capacity as researchers and when they are speaking as private citizens. …
[W]e’re not just looking at scientific conduct here. Rather, we’re looking at scientific conduct in an area about which the public cares a lot.
What this means is that the public here is paying rather more attention to how climate scientists are interacting with each other, and to the question of whether these interactions are compatible with the objective, knowledge-building project science is supposed to be.
[T]he purloined emails introduce new data relevant to the question of whether Dr. Mann’s research activities and interactions with other scientists — both those with whose conclusions he agrees and those with whose conclusions he does not agree — are consistent with or deviate from accepted scientific practices.
Evaluating the data gleaned from the emails, in turns, raises the question of what the community of scholars and the community of research scientists agree counts as accepted scientific practices.

Decision 4. Given that information emerged in the form of the emails purloined from CRU in November 2009, which have raised questions in the public’s mind about Dr. Mann’s conduct of his research activity, given that this may be undermining confidence in his findings as a scientist, and given that it may be undermining public trust in science in general and climate science specifically, the inquiry committee believes an investigatory committee of faculty peers from diverse fields should be constituted under RA-10 to further consider this allegation.

In sum, the overriding sentiment of this committee, which is composed of University administrators, is that allegation #4 revolves around the question of accepted faculty conduct surrounding scientific discourse and thus merits a review by a committee of faculty scientists. Only with such a review will the academic community and other interested parties likely feel that Penn State has discharged it responsibility on this matter.

What this means is that the investigation of allegation #4 that will follow upon this inquiry will necessarily take up the broad issue of what counts as accepted scientific practices. This discussion, and the findings of the investigation committee that may flow from it, may have far reaching consequences for how the public understands what good scientific work looks like, and for how scientists themselves understand what good scientific work looks like.

Accordingly, an Investigatory Committee was constituted and charged to examine that fourth allegation, and its report (PDF) has just been released. We’re going to have a look at what the Investigatory Committee found, and at its strategies for getting the relevant facts here.
Since this report is 19 pages long (the report of the inquiry committee was just 10), I won’t be discussing all the minutiae of how the committee was constituted, nor will I be discussing this report’s five page recap of the earlier committee’s report (since I’ve already discussed that report at some length). Instead, I’ll be focusing on this committee’s charge:

The Investigatory Committee’s charge is to determine whether or not Dr. Michael Mann engaged in, or participated in, directly or indirectly, any actions that seriously deviated from accepted practices within the academic community for proposing, conducting, or reporting research or other scholarly activities.

and on the particular strategies the Investigatory Committee used to make this determination.
Indeed, establishing what might count as a serious deviation from accepted practices within the academic community is not trivially easy (which is one reason people have argued against appending the “serious deviations” clause to fabrication, falsification, and plagiarism in official definitions of scientific misconduct). Much turns on the word “accepted” here. Are we talking about the practices a scientific or academic community accepts as what members of the community ought to do, or about practices that are “accepted” insofar as members of the community actually do them or are aware of others doing them (and don’t do a whole lot to stop them)? The Investigatory Committee here seems to be trying to establish what the relevant scientific community accepts as good practices, but there are a few places in the report where the evidence upon which they rely may merely establish the practices the community tolerates. There is a related question about whether the practices the community accepts as good can be counted on reliably to produce the good outcomes the community seems to assume they do, something I imagine people will want to discuss in the comments.
Let’s dig in. Because of how much there is to discuss, we’ll take it in three posts. This post will focus on the committee’s interviews with Dr. Mann and with Dr. William Easterling, Dean, College of Earth and Mineral Sciences, The Pennsylvania State University (and Mann’s boss, to the degree that the Dean of one’s College is one’s boss).
The second post will examine the committee’s interviews with Dr. William Curry, Senior Scientist, Geology and Geophysics Department, Woods Hole Oceanographic Institution; Dr. Jerry McManus, Professor, Department of Earth and Environmental Sciences, Columbia University; and Dr. Richard Lindzen, Alfred P. Sloan Professor, Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology.
The third post will then examine the other sources of information besides the interviews that the Investigatory Committee relied upon to establish what counts as accepted practices within the academic community (and specifically within the community of climate scientists) for proposing, conducting, or reporting research. All blockquotes from here on out are from the Investigatory Committee’s final report unless otherwise noted.

Continue reading

#scio10 aftermath: my tweets from “Getting the Science Right: The importance of fact checking mainstream science publications — an underappreciated and essential art — and the role scientists can and should (but often don’t) play in it.”

Session description: Much of the science that goes out to the general public through books, newspapers, blogs and many other sources is not professionally fact checked. As a result, much of the public’s understanding of science is based on factual errors. This discussion will focus on what scientists and journalists can do to fix that problem, and the importance of playing a pro-active role in the process.
The session was led by Rebecca Skloot (@RebeccaSkloot), Sheril Kirshenbaum (@Sheril_), and David Dobbs (@David_Dobbs).
Here’s the session’s wiki page.

Continue reading

Dismal, yes, but is it science?

As I was driving home from work today, I was listening to Marketplace on public radio. In the middle of a story, reported by Nancy Marshall Genzer, about opponents of health care reform, there was an interesting comment that bears on the nature of economics as a scientific discipline. From the transcript of the story:

The Chamber of Commerce is taking a bulldozer to the [health care reform] bill. Yesterday, the Washington Post reported the Chamber is hiring an economist to study the legislation. The goal: more ammunition to sink the bill.
Ewe Reinhardt teaches economics at Princeton. He says, if the Chamber does its study, it will probably get the result it wants.
EWE REINHARDT: You can always get an economist with a PhD from a reputable university to give a scientific report that makes your case. So, yes, there will be such a study.

Continue reading

When collaboration ends badly.

Back before I was sucked into the vortex of paper-grading, an eagle-eyed Mattababy pointed me to a very interesting post by astronomer Mike Brown. Brown details his efforts to collaborate with another team of scientists who were working on the same scientific question he was working on, what became of that attempted collaboration, and the bad feelings that followed when Brown and the other scientists ended up publishing separate papers on the question.
Here’s how Brown lays it out:

Continue reading

Physical phenomena, competing models, and evil.

Over at Starts with a Bang, Ethan Siegel expressed exasperation that Nature and New Scientist are paying attention to (and lending too much credibility to) an astronomical theory Ethan views as a non-starter, Modified Netwonian Dynamics (or MOND):

[W]hy is Nature making a big deal out of a paper like this? Why are magazines like New Scientist declaring that there are cracks in dark matter theories?
Because someone (my guess is HongSheng Zhao, one of the authors of this paper who’s fond of press releases and modifying gravity) is pimping this piece of evidence like it tells us something. Guess what? Galaxy rotation curves are the only thing MOND has ever been good for! MOND is lousy for everything else, and dark matter — which is good for everything else — is good for this too!
So thanks to a number of people for bringing these to my attention, because the record needs to be set straight. Dark matter: still fine. MOND: still horribly insufficient. Now, maybe we can get the editors and referees of journals like this to not only do quality control on the data, but also on the reasonableness of the conclusions drawn.

In a comment on that post, Steinn took issue with Ethan’s characterization of MOND:

Ethan – this is not a creationism debate.
Hong Sheng is a top dynamicist and he knows perfectly well what the issues are. The whole point of science at this level is to test models and propose falsifiable alternatives.
MOND may be wrong, but it is not evil.
Cold Dark Matter is a likelier hypothesis, by far, but it has some serious problems in detail, and the underlying microphysics is essentially unknown and plagued with poorly motivated speculation.
MOND has always approached the issue from a different perspective: that you start with What You See Is What You Get, and then look for minimal modifications to account for the discrepancies. It is a phenomenological model, and makes little attempt to be a fundamental theory of anything. Observers tend to like it because it gives direct comparison with data and is rapidly testable.
I think Leslie Sage knew what he was doing when he published this paper.

In a subsequent post, Ethan responded to Steinn:

Yes, Steinn, it is evil to present MOND as though it is a viable alternative to dark matter.
It is evil to spread information about science based only on some tiny fraction of the available data, especially when the entire data set overwhelmingly favors dark matter and crushes MOND so as to render it untenable. It isn’t evil in the same way that creationism is evil, but it is evil in the same way that pushing the steady-state-model over the Big Bang is evil.
It’s a lie based on an unfair, incomplete argument. It’s a discredited theory attacking the most valid model we have at — arguably — its only weak point. Or, to use a favorite term of mine, it is willfully ignorant to claim that MOND is reasonable in any sort of way as an alternative to dark matter. It’s possibly worse than that, because it’s selectively willful ignorance in this case.
And then I look at the effect it has. It undermines public understanding of dark matter, gravity, and the Universe, by presenting an unfeasible alternative as though it’s perfectly valid. And it isn’t perfectly valid. It isn’t even close. It has nothing to do with how good their results as scientists are; it has everything to do with the invalid, untrue, knowledge-undermining conclusions that the public receives.
And yes, I find that incredibly evil. Do you?

I have no strong views on MOND or Cold Dark Matter, but given that my professional focus includes the methodology of science and issues of ethics in science, I find this back and forth really interesting.

Continue reading