The Hellinga Retractions (part 1): when replication fails, what should happen next?

Because Abi asked me to, I’m going to discuss the fascinating case of the Hellinga retractions. Since this is another case where there is a lot to talk about, I’m going to take it in two parts. In the first part, working from the Chemical & Engineering News article by Celia Henry Arnaud (May 5, 2008) [1], I’ll focus on the common scientific activity of trying to build on a piece of published work. What happens when the real results seem not to fit with the published results? What should happen?

In part 2, drawing from the Nature news feature by Erika Check Hayden (May 15, 2008) [2], I’ll consider this case in the context of scientific collaborations — both within research groups and between research groups. In light of the differentials in experience and power (especially between graduate students and principal investigators), who is making the crucial scientific decisions, and on the basis of what information?
But let’s start with the papers [3,4] that came out of the research group of Homme W. Hellinga, professor of biochemistry at Duke University.

In the original papers, Hellinga and coworkers claimed to use computational methods to design proteins called NovoTIMs that catalyze the same reaction catalyzed by the enzyme triosephosphate isomerase (TIM). Although the reported kinetic values for the best NovoTIM weren’t as efficient as the natural TIM enzyme, the work offered hope that scientists would someday be able to design proteins capable of catalyzing any reaction. (40)

The papers were published. And, as one would hope, the papers were read by other biochemists. (Ideally, the successful communication of results includes someone on the receiving end.)

One of the readers was biochemist John P. Richards at SUNY-Buffalo. Richards studies the natural TIM enzyme and was interested in nailing down why the designed ones displayed lower activity. Figuring this out might shed light n the sorts of relations between structure and function that keep biochemists up at night.

Richard and his coworkers used the method published by Hellinga’s group to make NovoTIM in bacteria and then purify it. But the protein they isolated had much higher kinetic values than the Hellinga team had reported for the best of its designed enzymes, Richard says. He and his coworkers suspected that this activity resulted from contamination with wild-type TIM from the bacteria used to produce NovoTIM.
When they used a different purification strategy to isolate the expressed protein, the protein they harvested showed no TIM activity. (40)

Richard’s findings suggested that Hellinga’s published findings were in error. Richard passed this information on to Hellinga, as well as to the journals that published the papers. Alerted of Richard’s results, Hellinga’s lab did the same experiments again, this time using Richard’s purification strategy, and they got the same results Richard did. Hellinga then retracted the published papers (the Science paper on Feb. 1, 2008, the Journal of Molecular Biology paper on Feb. 23, 2008).

At this point in the story, you may be thinking to yourself that this is a perfect example of science working as it should. We have scientists communicating their results, scientists taking account of the results of others, scientists trying to build further knowledge from reported results, and scientists communicating with each other when reported results don’t hold up. Published work is not standing encased in Lucite, but instead is checked, rechecked, and corrected.

As Merton might say, organized skepticism, baby!

Now, no scientists wants to publish results that, on closer examination, don’t hold up, but the default assumption is that this sort of thing is probably an honest mistake. Building new knowledge on the frontiers of what we already know, you may not know enough about the system you’re studying to have your techniques perfectly refined or your controls perfectly designed. As well, there is the eternal question of how certain you must be to publish what you’ve found. If you wait for absolute certainty, you’ll never publish, so you have to make a judgment call about the line that defines “certain enough“. Indeed, getting results out sooner rather than later means that others can use them sooner — and having other scientists in your community engaging in the same systems and problems might well speed up the process of finding problems with your initial results.

Of course, as biochemists like Richard know all too well, it costs those others (like him) significant time and resources to try to repeat published findings and discover that they are not reproducible. If you started with the published finding in order to pursue some other scientific question that assumed the reliability of the published finding, you’ve also discovered that you can’t do the project you set out to do. And, there is no great career payoff for demonstrating another scientist’s published result is wrong.

Still, all research is a gamble, and you might just chalk this up to the risks inherent in participating in a self-correcting community-wide project of knowledge-building.

However, Richard and others were concerned that this expenditure of time and money had not uncovered an “honest mistake”. There were aspects of the published results that didn’t make sense in the context of the type of proteins presumed to be in the experimental system. The puzzling details were described by UC-Berkeley biochemist Jack F. Kirsch:

“The retraction only admitted to contamination with wild-type enzyme,” Kirsch tells C&EN. “That doesn’t explain the very low KM values that they reported for NovoTIM.” KM, also known as the Michaelis constant, is a reaction parameter that defines the substrate concentration at which the reaction reaches half its maximal velocity. If the wild-type enzyme is the contaminant, Kirsch points out, “it’s very hard to think of a way you could get a KM value that is much lower than that of the wild-type enzyme.”

In his letter [to Science, published online March 10, 2008], Kirsch further pointed out that some of the reported results would only make sense if the design had actually succeeded. For example, Hellinga’s team reported that as each of the three critical active-site residues in the designed protein was replaced with alanine, the mutants became less active. Double mutants are less active than single mutants, and triple mutants are the least active of all. “That’s not what you would expect if there was random wild-type contamination,” Kirsch says. (40-41)

How to make sense of the incorrect data reported in the Hellinga papers? Could this data possibly come from NovoTIM contaminated with the wild-type enzyme? That was hard to reconcile with the reported KM values. Were the experimental measurements badly controlled, or carelessly taken? Had researchers in the Hellinga group thrown out good data because they didn’t support the result they were expecting (and keep bad data because they did)?

Could the data depart from the experimental reality (as Richard and coworkers found it) because they were made up?

Compounding the mystery of what could have been going on in the Hellinga group’s experiments to produce the data reported in the two papers, University of Illinois, Urbana-Chapaign biochemist John A. Gerlt, along with Richard, argued that even if the NovoTIM hadn’t been contaminated, the results the Hellinga group seems to have expected weren’t to be expected at all — at least, not on the basis of the structural features of the protein Hellinga’s group had designed.

TIM catalyzes the interconversion of dihydroxyacetone phosphate (DHAP) and D-glyceraldehyde 3-phosphate (GAP). However, the NovoTIM protein, as Hellinga’s group described it, abstracts a proton that results in the formation of L-GAP. In a letter to Hellinga, Richard explained the situation in more detail: “Any protein designed to catalyze suprafacial transfer of the pro-S hydrogen of DHAP via a cis-enediol[ate] intermediate would form L-GAP and would, therefore, not show activity using the standard enzymatic assay for isomerization of DHAP. This is because L-GAP is not a substrate for the coupling enzyme [glyceraldehyde 3-phosphate dehydrogenase] used in the assay for isomerization of DHAP.” Therefore, even if the designed protein had worked, the assays shouldn’t have given a positive response. (41)

The fit between enzyme and substrate is crucial for enzyme activity, and is sometimes described as a “hand in glove” relationship. The assays Hellinga’s papers reported using would use the “coupling enzyme” to track the production of D-GAP (as a way to measure the activity of TIM in converting DHAP to D-GAP). However, the NovoTIM would convert DHAP to L-GAP — the left hand to D-GAP’s right hand. And, the “left hand” (L-GAP) wouldn’t fit properly into the “coupling enzyme” glove that was specific for the “right hand” (D-GAP).

How, in other words, could the NovoTIM have yielded any activity at all with this assay? If the only thing in the samples with enzyme activity was the wild-type enzyme from the bacteria, why didn’t it show the characteristic TIM activity in the assay?

All of these issues seemed to deepen the mystery of the initially published and now retracted results. Why should the assays described in the papers have given the reported results? Aside from the wild-type TIM contaminating them, what else was in those samples?

Hellinga himself seems not terribly curious about the answers to these questions.

Hellinga tells C&EN that he decided not to address such questions because he believes the fact that the designed enzyme ultimately didn’t show TIM activity made many other points moot. “I didn’t see a reason to go into the design methodology because the experiment clearly didn’t work,” he says. “By inference, obviously the design was wrong.” (41)

Manifestly, there are reasons to be curious about the withdrawn results. How else could researchers — in Hellinga’s research group or in other research groups — avoid similar mistakes in future experiments? Mightn’t following up on some of these puzzles lead to the discovery of something unexpected? (Perhaps the methods described didn’t result in the synthesis of the the target protein but yielded something else with interesting properties.)

Publishing your results is sharing your questions, your results, and any puzzle that may arise from them with your scientific community. Once you’ve shared them, it’s no longer just a question of what your original goals happened to be when you initiated the project. As the scientist who put those results into the conversation, you’re now on the hook to help answer the community’s questions about the results you published. This makes Hellinga’s apparent “moving on” attitude toward the legitimate puzzles arising from his results seem a little off.

That attitude cannot help but highlight another legitimate question around these results: can we trust you? Is yours a research group that runs good experiments, collects reliable data, and makes accurate reports to the rest of the scientific community? When the reports don’t hold up, will you be accountable to the community to figure out what went wrong so that we all will benefit from that information?

In part 2, when we look at the ways collaboration between scientists were at work in producing, and then toppling, these results, there will be more to say about trust and accountability.
_______
[1] Celia Henry Arnaud, “Enzyme Design Papers Retracted,” Chemical & Engineering News, 86(18), 40-41 (May 5, 2008).

All quotations in the post are from this article, with page numbers given parenthetically.

[2] Erika Check Hayden, “Designer Debacle,” Nature, 451, 275-278 (May 15, 2008).

[3] Dwyer, M.A., Looger, L.L., and Hellinga, H.W., Science 304, 1967-1971 (2004).

[4] Allert, M., Dwyer, M.A., and Hellinga, H.W., Journal of Molecular Biology 366, 945-953 (2007).

facebooktwittergoogle_pluslinkedinmail
Posted in Chemistry, Communication, Ethical research, Methodology.

10 Comments

  1. If I read all that correctly, the way activity was measured was through an enzyme coupling with D-GAP. Would it be possible for that enzyme to have been contaminated with a form that coupled with L-GAP, but was otherwise the same?

  2. Thanks for posting this. I get C&E News, but I admit most of the time I don’t read it.
    I’ve always been told that retracting a paper is something that is sure to ruin your career entirely. Is this the case? I know that the idea of ruining my career is enough to make sure I don’t publish unless I’m 99% sure (within a SEM of 0.8, p

  3. Publishing your results is sharing your questions, your results, and any puzzle that may arise from them with your scientific community. Once you’ve shared them, it’s no longer just a question of what your original goals happened to be when you initiated the project. As the scientist who put those results into the conversation, you’re now on the hook to help answer the community’s questions about the results you published. This makes Hellinga’s apparent “moving on” attitude toward the legitimate puzzles arising from his results seem a little off.

    As I like to put it, “It’s not about *you*. It’s about arriving at a reasonable conclusion concerning some feature of objective reality.”

  4. Thanks for this post (and the forthcoming one, too!), Janet.
    When I wrote to you, my concerns were directed more at how Hellinga (ill)treated his (former) student and co-author. I felt that it was horribly unfair that, after their paper ran into serious trouble, Hellinga starts with accusing his student of falsification and fabrication (I’m glad you will be talking about this part in Part 2).
    But I see that there were many other important angles to this controversy. Thanks for addressing them.
    I can’t wait for Part 2!

  5. I’ll focus on the common scientific activity of trying to build on a piece of published work. What happens when the real results seem not to fit with the published results?
    I’d note that this case is an extremely non-representative example of how this “common scientific activity” plays out, for a variety of reasons: the high profile and complete, objective wrongness of the original paper, the well-connectedness of Richard, the apparently widespread dislike of Hellinga.
    The vast majority of discrepancies between new and published work don’t end in anything like this. Usually there’s either a few new papers that contradict the old one and weight accumulates on the other side, or, more often, just a silence in the literature after the original paper.

  6. A nice summary of events.
    An important point has not been mentioned enough. Hellinga and co-workers knew something was very wrong in the Science paper [3] because they corrected it themselves in their JMB paper [4].
    Their admission of error and its correction appeared in a footnote to a Table 1 in the JMB paper [4] that states regarding the ecNovoTIM1.2 kinetics: “This KM value has been revised [to 7.1 mM] from the originally published value (0.18 mM),9 because the fit to the previously reported measurements was incorrect. The original kcat value of ecNovoTIM1.2 has not been revised.”
    Because we now know that the measured values for ecNovoTIM1.2 [Km 7.1 mM] derived from contaminating wild-type TIM [Km of 1.6 mM], these numbers must match one another. Three problems that should have raised red flags to someone:
    1) How can you plot a Km curve and get a Km of 0.18 mM when the number should be 7.1 mM? By eyeball you should easily detect the 40-fold error.
    2) When the Kms for the NovoTIMs matched the Km for wild-type TIM, would that not send up a red flag to you (or reviewers) suggesting contamination of wild-type TIM?
    3) Where are the error measurements for the NovoTIM constructs? Any program that spits out a Km or Vmax value also spits out an error value for that number based on the scatter of the points. It should be standard practice.

  7. Science: going to Hellinga handbasket.

    Excellent. On the retracting a paper front and the fallout, I was interested to read about the retraction of a paper about fish oil for osteoarthritis.

    Following further work in our laboratories, we have discovered that the article contains several examples of incorrect presentation of scientific data…Factual data related to this study will be submitted to scientific journals for peer review and future publication.

    What is the likelihood of the data being accepted for further publication?

Leave a Reply

Your email address will not be published. Required fields are marked *