Is objectivity an ethical duty? (More on the Hauser case.)

Today the Chronicle of Higher Education has an article that bears on the allegation of shenanigans in the research lab of Marc D. Hauser. As the article draws heavily on documents given to the Chronicle by anonymous sources, rather than on official documents from Harvard’s inquiry into allegations of misconduct in the Hauser lab, we are going to take them with a large grain of salt. However, I think the Chronicle story raises some interesting questions about the intersection of scientific methodology and ethics.

From the article:

It was one experiment in particular that led members of Mr. Hauser’s lab to become suspicious of his research and, in the end, to report their concerns about the professor to Harvard administrators.

The experiment tested the ability of rhesus monkeys to recognize sound patterns. Researchers played a series of three tones (in a pattern like A-B-A) over a sound system. After establishing the pattern, they would vary it (for instance, A-B-B) and see whether the monkeys were aware of the change. If a monkey looked at the speaker, this was taken as an indication that a difference was noticed. …

Researchers watched videotapes of the experiments and “coded” the results, meaning that they wrote down how the monkeys reacted. As was common practice, two researchers independently coded the results so that their findings could later be compared to eliminate errors or bias.

According to the document that was provided to The Chronicle, the experiment in question was coded by Mr. Hauser and a research assistant in his laboratory. A second research assistant was asked by Mr. Hauser to analyze the results. When the second research assistant analyzed the first research assistant’s codes, he found that the monkeys didn’t seem to notice the change in pattern. In fact, they looked at the speaker more often when the pattern was the same. In other words, the experiment was a bust.

But Mr. Hauser’s coding showed something else entirely: He found that the monkeys did notice the change in pattern—and, according to his numbers, the results were statistically significant. If his coding was right, the experiment was a big success.

The second research assistant was bothered by the discrepancy. How could two researchers watching the same videotapes arrive at such different conclusions? He suggested to Mr. Hauser that a third researcher should code the results. In an e-mail message to Mr. Hauser, a copy of which was provided to The Chronicle, the research assistant who analyzed the numbers explained his concern. “I don’t feel comfortable analyzing results/publishing data with that kind of skew until we can verify that with a third coder,” he wrote.

A graduate student agreed with the research assistant and joined him in pressing Mr. Hauser to allow the results to be checked, the document given to The Chronicle indicates. But Mr. Hauser resisted, repeatedly arguing against having a third researcher code the videotapes and writing that they should simply go with the data as he had already coded it. After several back-and-forths, it became plain that the professor was annoyed.

“i am getting a bit pissed here,” Mr. Hauser wrote in an e-mail to one research assistant. “there were no inconsistencies! let me repeat what happened. i coded everything. then [a research assistant] coded all the trials highlighted in yellow. we only had one trial that didn’t agree. i then mistakenly told [another research assistant] to look at column B when he should have looked at column D. … we need to resolve this because i am not sure why we are going in circles.”

The research assistant who analyzed the data and the graduate student decided to review the tapes themselves, without Mr. Hauser’s permission, the document says. They each coded the results independently. Their findings concurred with the conclusion that the experiment had failed: The monkeys didn’t appear to react to the change in patterns.

They then reviewed Mr. Hauser’s coding and, according to the research assistant’s statement, discovered that what he had written down bore little relation to what they had actually observed on the videotapes. He would, for instance, mark that a monkey had turned its head when the monkey didn’t so much as flinch. It wasn’t simply a case of differing interpretations, they believed: His data were just completely wrong. …

The research that was the catalyst for the inquiry ended up being tabled, but only after additional problems were found with the data. In a statement to Harvard officials in 2007, the research assistant who instigated what became a revolt among junior members of the lab, outlined his larger concerns: “The most disconcerting part of the whole experience to me was the feeling that Marc was using his position of authority to force us to accept sloppy (at best) science.”

The big methodological question here is how best to extract objective data about the monkeys’ behavior in these experiments from the videotapes.

It’s hard to tell from what’s in the Chronicle article whether the audio from the experiments was audible during the “coding” of the monkey responses. I’d think that having it off would make it easier to extract more objective data, since the researchers watching the tape to code the results wouldn’t be swayed (consciously or unconsciously) by the audio clues to what they expected or hoped to see the monkeys doing. Safer would be just to characterize the visual record of what the monkeys were doing — where they were looking, whether they changed the direction that they were looking gradually or suddenly, and so forth — by the time stamp on the video, only adding in the information about what musical patterns were being played at what time stamps after the coding of the monkey responses.

Probably researchers who have actually done this kind of observational experiment with young humans, non-human primates, or other animals could offer other strategies for making sure the coding results in as objective data as possible.

In any case, there are some general questions that researchers here ought to pose:

  • If there’s a worry about the objectivity of data interpretation in your research, in what circumstances would you not want to bring in one or more fresh sets of eyes?
  • Should you really assume a mistake in the comparison of two sets of coding rather than an inconsistency between the two sets of coded data? (Rather than assuming either of these possibilities as the source of the disagreement, wouldn’t it be prudent to actually investigate the source of the inconsistency?)
  • Is there something worrisome about prioritizing the boss’s coding of the data? The researchers (including the boss) are supposed to be looking for objective results — ideally, something “anyone” could observe in the animals’ behaviors. It’s true that some kinds of data may be hard to observe without special training or expertise. Still, you want the data you report to be a matter of actual observation, not intuition.

Set aside the question of whether the documents leaked to the Chronicle are accurate. While you’re at it, set aside the question of whether the behavior the Chronicle article alleges that Hauser committed crosses the line to scientific conduct. I think it’s fair to say that honest science requires that scientists take reasonable steps to ensure that they are as objective as possible about the data they are reporting. What do you think “best practices” should be for getting objective observational data in this kind of research?

Hat-tip: John Fleck, via Twitter.

facebooktwittergoogle_pluslinkedinmail
Posted in [Humanities&Social Science], Current events, Ethical research, Methodology, Professional ethics, Research with animals.

6 Comments

  1. “The big methodological question here is how best to extract objective data about the monkeys’ behavior in these experiments from the videotapes.”

    1) Use an eyetracker instead of subjective coding.

    2) Use a double-blind procedure where those analyzing the data are unaware of the sound patterns in each trial.

  2. Is there something worrisome about prioritizing the boss’s coding of the data?

    Not really. Often there is a fair level of subjectivity in scoring data. And the boss will have more experience. Of course, the boss will have trained the students, so their scores should coincide pretty closely. Of course, discrepancies like they found here to be checked.

    When I was a plant pathologist, we used to score mildew infection on a 0-4 scale. I learned it in one lab in the UK, and then went to another, in Denmark. I sat down with the Grand Old Man to compare our scorings, and we were pretty close – the only differences were a couple of intermediates which are always a bit difficult. I’d argue that we were both subjective, but through training we can converge.

  3. A widely accepted practice in behavioral and other research where subjectivity could stick its unwanted nose into the interpretation of the experiment is the use of blind ( the person recording the data did not know the treatment combination of the subject and double blind (more elaborate masking of treatments from researchers, even through data processing and analysis). experiments. It is strange to me that the journals and the granting agencies dealing with this work would not have required at least blind result recording. For example in my work with diseases that attack plants, and the effect of diseases upon other organisms that can thwart the effect upon the plant, we use double blind procedures.

  4. I’m surprised that no one is paying any attention to the statement in Hauser’s quoted email that there was no inconsistency (“we had only one trial that didn’t agree”), and that the appearance of inconsistency came about only because the research assistant wasn’t comparing the appropriate columns of data (in some kind of spreadsheet, I assume).

    Now, I have no idea of what the columns represent. I have no idea of what the trials “highlighted in yellow” were. No one does, outside of the Harvard investigation. I also don’t know whether the scoring of the video was blind or not relative to the changes in musical tone. I agree that it should have been, but it’s not clear to me in the quoted passage. However, a quick check of a couple of Hauser’s papers (in Proc. Royal Soc.) reveals that they used double blind procedures in scoring video tapes of behavior. I would conjecture that this was standard protocol because, as several commenters have pointed out, it is obvious.

    But I am a bit concerned that no one, in a discussion based on a single piece of text, seems to be paying any attention to what the accused party is reported to have said, relative to what the accusing party is reported to have said. I hope that this is not because it is more fun to think about scandal hitting a high-profile Harvard professor than it is to think about a research assistant not understanding the data they are supposed to be analyzing.

    I have to say that the whole thing keeps making me remind myself that there have been no reliable reports of what is actually alleged to have happened.

  5. I must be missing something, but Hauser’s explanation doesn’t make much sense to me as ecologist put it (or as I understand it). If it were just a matter of “No, dude, this column is the data.”, then I don’t really understand why it would have involved anyone else in the group or generated such suspicion among them, let alone have persisted when it got outside of the group (if , for some reason, his group was framing him – if it were another group member, I wouldn’t figure that their word would have as much weight as the advisor unless things are really bad in the group or people are really stupid). It seems like something that shouldn’t have gone anywhere.

    His explanation sounds too much like “My trainer gave me the wrong needle” to pass the sniff test.

  6. This intimidation of subordinates to artificially prop up the professor’s career is pure Harvard. Dealing with the medical careerists is much more difficult and dangerous. This is really a minor event overall; the interesting part is how long it took to properly address the failures.

Leave a Reply

Your email address will not be published. Required fields are marked *