Evaluating scientific reports (and the reliability of the scientists reporting them).

One of the things scientific methodology has going for it (at least in theory) is a high degree of transparency. When scientists report findings to other scientists in the community (say, in a journal article), it is not enough for them to just report what they observed. They must give detailed specifications of the conditions in the field or in the lab — just how did they set up and run that experiment, choose their sample, make their measurement. They must explain how they processed the raw data they collected, giving a justification for processing it this way. And, in drawing conclusions from their data, they must anticipate concerns that the data might have been due to something other than the phenomenon of interest, or that the measurements might better support an alternate conclusion, and answer those objections.

A key part of transparency in scientific communications is showing your work. In their reports, scientists are supposed to include enough detailed information so that other scientists could set up the same experiments, or could follow the inferential chain from raw data to processed data to conclusions and see if it holds up to scrutiny.

Of course, scientists try their best to apply hard-headed scrutiny to their own results before they send the manuscript to the journal editors, but the whole idea of peer review, and indeed the communication around a reported result that continues after publication, is that the scientific community exercises “organized skepticism” in order to discern which results are robust and reflective of the system under study rather than wishful thinking or laboratory flukes. If your goal is accurate information about the phenomenon you’re studying, you recognize the value of hard questions from your scientific peers about your measurements and your inferences. Getting it right means catching your mistakes and making sure your conclusions are well grounded.

What sort of conclusions should we draw, then, when a scientist seems resistant to transparency, evasive in responding to concerns raised by peer reviewers, and indignant when mistakes are brought to light?

It’s time to revisit the case of Stephen Pennycook and his research group at Oak Ridge National Laboratory. In an earlier post I mused on the saga of this lab’s 1993 Nature paper [1] and its 2006 correction [2] (or “corrigendum” for the Latin fans), in light of allegations that the Pennycook group had manipulated data in another recent paper submitted to Nature Physics. (In addition to the coverage in the Boston Globe (PDF), the situation was discussed in a news article in Nature [3] and a Nature editorial [4].)

Now, it’s time to consider the recently uploaded communication by J. Silcox and D. A. Muller (PDF) [5] that analyzes the corrigendum and argues that a retraction, not a correction, was called for.

It’s worth noting that this communication was (according to a news story at Nature about how the U.S. Department of Energy handles scientific misconduct allegations [6]) submitted to Nature as a technical comment back in 2006 and accepted for publication “pending a reply by Pennycook.” Five years later, uploading the technical comment to arXiv.org makes some sense, since a communication that never sees the light of day doesn’t do much to further scientific discussion.

Given the tangle of issues at stake here, we’re going to pace ourselves. In this post, I lay out the broad details of Silcox and Muller’s argument (drawing also on the online appendix to their communication) as to what the presented data show and what they do not show. In a follow-up post, my focus will be on what we can infer from the conduct of the authors of the disputed 1993 paper and 2006 corrigendum in their exchanges with peer reviewers, journal editors, and the scientific community. Then, I’ll have at least one more post discussing the issues raised by the Nature news story and the related Nature editorial on the DOE’s procedures for dealing with alleged misconduct [7].


Faced with a manuscript to review or a published scientific paper, a reasonable set of questions for a scientist to ask might be:

  • What was measured or observed in the described experiment (i.e., what are the data)?
  • How were those data processed?
  • What conclusions are supported by the processed data?

In their analysis, Silcox and Muller respond to the 2006 correction offered to the 1993 paper, but this analysis sheds light on some worrying issues with the 1993 paper that were raised by peer reviewers before that paper was published in the first place. Among these problems:

  1. The reported data changed from first draft to second to third.
  2. The results of the described data processing the authors reported did not match the concentration profiles actually calculated from the reported spectra and the described method of data processing.
  3. If the spectra they reported were analyzed using the methods described, the results would not support the conclusion of the initial paper (that they found an atomically-abrupt profile). In other words, the data don’t support the conclusion.

Initially, the first of these issues might not strike you as necessarily problematic, so let’s consider it more closely. The 1993 paper argued that it was possible to produce, with atomic resolution, an image of the boundary between two thin layers, one of cobalt silicide and one of silicon. The data included in the original draft of the paper sent out for review consisted of 7 spectra which were supposed to have been obtained with a scanning transmission electron microscope on each side of this border.

In this original draft, spectrum 5 included a cobalt core edge that was “questioned as being inconsistent with an atomic resolution profile.” The paper was asserting that their measurement with a scanning transmission electron microscope was achieving atomic resolution, yet the spectral data in spectrum 5 looked too gradual to support this claim. In further processing this data, the line profile reportedly drawn from the originally processed spectral data* looked too sharp — the basic idea being that you can’t get a sharp image from a blurry object — and, moreover, too sharp for what should have even been theoretically possible given the resolution of the microscope.

The referee raised concerns.

In the second draft, prepared to address concerns raised by the referees, spectrum 5 had been replaced by 5′, which showed no cobalt edge. Spectra 6 and 7 had (both) been replaced by 7′. Spectra 1-4 appeared not to change at all. At the time, the authors “provided a detailed explanation as to why these [i.e., spectra 1-4] were different curves, despite looking the same.” Silcox and Muller argue that digitization shows these spectra are the same, and the authors of the 1993 paper and 2006 corrigendum now admit as much.

In the third draft, spectral curve 6 (which was replaced with 7′ in the second draft) was replaced with 6′. The authors identified the duplication of the last two spectra in the second draft as due to a mistake in printing the data sets, which they corrected in the third and final version of the paper.

For all of these changes, the authors claimed that they had not in fact changed the original data (the CCD readouts) from which they were drawing their conclusions, but instead that they had reprocessed all the original spectral data. Yet, while they claimed (in 1993) to have reprocessed spectra 1-7, spectra 1-4 appear to have been utterly unaltered by the purported reprocessing. The claimed reprocessing that resulted in spectra 5′-7′ should also have altered spectra 1-4 in some way. When this concern was raised by a referee, the authors replied, “we find it rather insulting to suggest that we would use different analysis methods for different spectra.” In other words, their claim was that they had analyzed all 7 spectra the same way.

At least, that was their claim in 1993. In the 2006 corrigendum, they do identify that spectra 5-7 were reprocessed differently in order to remove the background beneath these spectra:

[W]e have now concluded that only spectra 5-7 were processed in the manner described; for spectra 1-4, owing to an error, the data were reproduced from ref. 7, where a standard exponential background subtraction was used. The exponential background subtraction method is widely used in the field and is an entirely acceptable method of analysis, and therefore this error in no way affects the key scientific claim of the paper, namely that it is possible to perform atomic-resolution chemical analysis in the scanning transmission electron microscope.

They suggest that the spectra on the two sides of the interface needed to be treated differently, or at least that it is “entirely acceptable” to do so. However, since the data in the 7 spectra were presumably the evidential basis for the claim that there was an abrupt interface in the first place, how would they know in advance which of the spectra were on which side of the interface? Without some independent justification for the choice of analysis method for each spectrum in the set, the argument starts to look circular.

Even the claim that the three spectra 5-7 were reprocessed the same way to yield the curves in the third and final version of the paper looks shaky. Silcox and Muller note that the disappearance of the cobalt feature in 5 to yield 5′ indicates a processing method that, upon reprocessing 6 and 7 the same way, should have left “negative features” — features that are not seen in 6′ and 7′. Indeed, the application of a number of different methods that might have been used to process the spectral curves failed to show a consistent way of processing 5-7 to arrive at 5′-7′.

The best conclusion seems to be that 5′-7′ really are different data from those initially reported, or that the authors didn’t process the data from spectra 5-7 according to the methods described in the paper.

Wouldn’t the omission of 3 (out of 7) of the original spectra and the inclusion of 3 new spectra call for some explanation? Were the 3 spectra that were omitted measurements from a totally different experiment, or measurements made on a day when the scanning transmission electron microscope was on the fritz? If that were the case, why wouldn’t the authors say as much, rather than asserting that it was the same data as in the original manuscript, simply reprocessed (whether using a single method to remove the background or using two distinct methods as claimed in the 2006 corrigendum)? Indeed, if the 3 spectra that were omitted were legitimate measurements from the experiment, and if the referee was correct that they were notable because they seemed not to support the conclusions the authors were drawing, omitting them rather than dealing with them would seem a significant departure from the kind of skepticism scientists are supposed to apply.

If the analysis methods applied to the data were not those described in the paper, the reader of the paper is left unable to work through the analysis and convince herself that the processed data (1) yields what is presented as the processed data in the paper, and (2) that it supports the conclusion the authors draw from it. When the described analysis methods do not correspond to the “processed” data presented in the paper, not only is the reader unable to “check the work” of the authors, but she is also unclear as to how the data actually were processed, whether such processing is appropriate, and even whether the data in the 7 spectra were analyzed consistently.

In short, then, on its way from manuscript to published article, the 1993 paper seemed to display problems with the spectral data and unclarity and inconsistency with the processing of the data. If the data and the described analysis methods were correct, they did not support the authors’ conclusions. If either the reported data or the described analysis methods (or both) were incorrect, then a reader of the 1993 paper would still have no good reason to accept the conclusions presented in that paper. Without accurate data and a clear description of how the data are worked up, the “conclusion” hasn’t been properly supported, and a skeptical scientist should properly regard it as needing support.

Thus, according to Silcox and Muller, the 1993 paper failed to make a good scientific case for its conclusions.

Here, recall the roles a scientific paper plays.

One is to set forth claims about what we know and how we know it. The “how we know it” is all about the evidence researchers line up to support their conclusions — the evidence that makes those conclusions count as knowledge.

But another role a scientific paper plays is to establish a priority claim for a new piece of knowledge being added to the body of knowledge the scientific community shares. We don’t generally recognize the priority of the first person who had a hunch that the world was a certain way, or that an instrumental method could be pushed to a particular limit. Rather, we recognize the priority of the scientists who first lined up the evidence to demonstrate that it was so.

Corrections matter because the evidential basis for scientific knowledge matters. They also matter because the score-keeping tied to priority claims can have real consequences for scientists’ careers — for who gets hired, or funded, or cited, or recognized in other ways.

Next up: What kind of inferences can be supported given the exchanges between the authors of the disputed paper and the corrigendum and the referees?
________
*The spectra as originally processed were supposed to be CCD readouts with the background removed. The spectral data was then processed further. So technically, none of the data reported in the paper were completely “raw” data, but that seems to be pretty typical.
________
[1] N. D. Browning, M. F. Chisholm and S. J. Pennycook, “Atomic-resolution chemical analysis using a scanning transmission electron microscope,” Nature, Vol. 366 (11 November 1993), 143 – 146.

[2] N. D. Browning, M. F. Chisholm and S. J. Pennycook, “Corrigendum: Atomic-resolution chemical analysis using a scanning transmission electron microscope,” Nature, Vol. 444 (9 November 2006), 235.

[3] Geoff Brumfiel, “Data handling causes image problem for top lab,” Nature, Vol. 444 (9 November 2006), 129.

[4] Editors, “Correction or retraction?” Nature, Vol. 444 (9 November 2006), 123-124.

[5] J. Silcox and D.A. Muller, “Brief Communication arising from ‘Corrigendum: Atomic resolution chemical analysis using a scanning transmission electron microscope,’ ” arXiv:1106.4534v1 , submitted 22 June 2011

[6] Eugenie Samuel Reich, “Investigation Closed” Nature, Vol. 475 (7 July 2011), 20-22.

[7] Editors, “Activation Energy” Nature, Vol. 475 (7 July 2011), 5-6.

facebooktwittergoogle_pluslinkedinmail
Posted in Communication, Ethical research, Physics, Tribe of Science.

3 Comments

  1. I think that Nature is very very bad on this. Many journals have this weird process where they claim to publish technical comments, etc., but almost never do, and they waste many author’s time in dragging them through the process- there is a famous case in I think the climate field. Also, many have rules that their must be a reply, and if there isn’t a response, the targeted authors solve their own problem. It is totally ridiculous. Nature, it should be noted has a fake crystal structure on their books. They published a “Brief Communication Arising” (http://www.nature.com/nature/journal/v448/n7154/full/nature06102.html) as well as a totally BS response, where the responding authors (the PI now known to be a fraud) states a bunch of crap), and the entire affair comes across as “they said/they said” when in reality the crystal structure did not fit with the laws of physics.

  2. In 1988 we did some high resolution work at oranl and subsequently published it in the journal of physics. The work was on Ge-Si interfaced. Later I noticed our images being used by the same Atmic resolution group mentioned in the article without ever referencing it. Which I thought was not right.

Leave a Reply

Your email address will not be published. Required fields are marked *