Evaluating scientific claims (or, do we have to take the scientist’s word for it?)

Recently, we’ve noted that a public composed mostly of non-scientists may find itself asked to trust scientists, in large part because members of that public are not usually in a position to make all their own scientific knowledge. This is not a problem unique to non-scientists, though — once scientists reach the end of the tether of their expertise, they end up having to approach the knowledge claims of scientists in other fields with some mixture of trust and skepticism. (It’s reasonable to ask what the right mixture of trust and skepticism would be in particular circumstances, but there’s not a handy formula with which to calculate this.)

Are we in a position where, outside our own narrow area of expertise, we either have to commit to agnosticism or take someone else’s word for things? If we’re not able to directly evaluate the data, does that mean we have no good way to evaluate the credibility of the scientist pointing to the data to make a claim?

This raises an interesting question for science journalism, not so much about what role it should play as what role it could play.

If only a trained scientist could evaluate the credibility of scientific claims (and then perhaps only in the particular scientific field in which one was trained), this might reduce science journalism to a mere matter of publishing press releases, or of reporting on scientists’ social events, sense of style, and the like. Alternatively, if the public looked to science journalists not just to communicate the knowledge claims various scientists are putting forward but also to do some evaluative work on our behalf — sorting out credible claims and credible scientists from the crowd — we might imagine that good science journalism demands extensive scientific training (and that we probably need a separate science reporter for each specialized area of science to be covered).

In an era where media outlets are more likely to cut the science desk than expand it, pinning our hopes on legions of science-Ph.D.-earning reporters on the science beat might be a bad idea.

I don’t think our prospects for evaluating scientific credibility are quite that bad.

Scientific knowledge is built on empirical data, and the details of the data (what sort of data is relevant to the question at hand, what kind of data can we actually collect, what techniques are better or worse for collecting the data, how we distinguish data from noise, etc.) can vary quite a lot in different scientific disciplines, and in different areas of research within those disciplines. However, there are commonalities in the basic patterns of reasoning that scientists in all fields use to compare their theories with their data. Some of these patterns of reasoning may be rather sophisticated, perhaps even non-intuitive. (I’m guessing certain kinds of probabilistic or statistical reasoning might fit this category.) But others will be the patterns of reasoning that get highlighted when “the scientific method” is taught.

In other words, even if I can’t evaluate someone else’s raw data to tell you directly what it means, I can evaluate the way that data is used to support or refute claims. I can recognize logical fallacies and distinguish them from instances of valid reasoning. Moreover, this is the kind of thing that a non-scientist who is good at critical thinking (whether a journalist or a member of the public consuming a news story) could evaluate as well.

One way to judge scientific credibility (or lack thereof) is to scope out the logical structure of the arguments a scientist is putting up for consideration. It is possible to judge whether arguments have the right kind of relationship to the empirical data without wallowing in that data oneself. Credible scientists can lay out:

  • Here’s my hypothesis.
  • Here’s what you’d expect to observe if the hypothesis is true. Here, on the other hand, is what you’d expect to observe if the hypothesis is false.
  • Here’s what we actually observed (and here are the steps we took to control the other variables).
  • Here’s what we can say (and with what degree of certainty) about the hypothesis in the light of these results.
  • Here’s the next study we’d like to do to be even more sure.

And, not only will the logical connections between the data and what is inferred from them look plausible to the science writer who is hip to the scientific method, but they ought to look plausible to other scientists — even to scientists who might prefer different hypotheses, or different experimental approaches. If what makes something good science is its epistemology — the process by which data are used to generate and/or support knowledge claims — then even scientists who may disagree with those knowledge claims should still be able to recognize the patterns of reasoning involved as properly scientific. This suggests a couple more things we might ask credible scientists to display:

  • Here are the results of which we’re aware (published and unpublished) that might undermine our findings.
  • Here’s how we have taken their criticisms (or implied criticisms) seriously in evaluating our own results.

If the patterns of reasoning are properly scientific, why wouldn’t all the scientists agree about the knowledge claims themselves? Perhaps they’re taking different sets of data into account, or they disagree about certain of the assumptions made in framing the question. The important thing to notice here is that scientists can disagree with each other about experimental results and scientific conclusions without thinking that the other guy is a bad scientist. The hope is that, in the fullness of time, more data and dialogue will resolve the disagreements. But good, smart, honest scientists can disagree.

This is not to say that there aren’t folks in lab coats whose thinking is sloppy. Indeed, catching sloppy thinking is the kind of thing you’d hope a good general understanding of science would help someone (like a scientific colleague, or a science journalist) to do. At that point, of course, it’s good to have backup — other scientists who can give you their read on the pattern of reasoning, for example. And, to the extent that a scientist — especially one talking “on the record” about the science (whether to a reporter or to other scientists or to scientifically literate members of the public) — displays sloppy thinking, that would tend to undermine his or her credibility.

There are other kinds of evaluation you can probably make of a scientist’s credibility without being an expert in his or her field. Examining a scientific paper to see if the sources cited make the claims that they are purported to make by the paper citing them is one way to assess credibility. Determining whether a scientist might be biased by an employer or a funding source may be harder. But there, I suspect many of the scientists themselves are aware of these concerns and will go the extra mile to establish their credibility by taking the possibility that they are seeing what they want to see very seriously and testing their hypotheses fairly stringently so they can answer possible objections.

It’s harder still to get a good read on the credibility of scientists who present evidence and interpretations with the right sort of logical structure but who have, in fact, fabricated or falsified that evidence. Being wary of results that seem too good to be true is probably a good strategy here. Also, once a scientist is caught in such misconduct, it’s entirely appropriate not to trust another word that comes from his or her mouth.

One of the things fans of science have tended to like is that it’s a route to knowledge that is, at least potentially, open to any of us. It draws on empirical data we can get at through our senses and on our powers of rational thinking. As it happens, the empirical data have gotten pretty complicated, and there’s usually a good bit of technology between the thing in the world we’re trying to observe and the sense organs we’re using to observe it. However, those powers of rational thinking are still at the center of how the scientific knowledge gets built. Those powers need careful cultivation, but to at least a first approximation they may be enough to help us tell the people doing good science from the cranks.

What a scientist knows about science (or, the limits of expertise).

In a world where scientific knowledge might be useful in guiding decisions we make individually and collectively, one reason non-scientists might want to listen to scientists is that scientists are presumed to have the expertise to sort reliable knowledge claims from snake oil. If you’re not in the position to make your own scientific knowledge, your best bet might be to have a scientific knowledge builder tell you what counts as good science.

But, can members of the public depend on any scientist off the street (or out of the lab) to vet all the putative scientific claims for credibility?

Here, we have to grapple with the relationship between Science and particular scientific disciplines — and especially with the question of whether there is enough of a common core between different areas of science that scientists trained in one area can be trusted to recognize the strengths and weaknesses of work in another scientific area. How important is all that specialization research scientists do? Can we trust that, to some extent, all science follows the same rules, thus equipping any scientist to weigh in intelligently about any given piece of it?

It’s hard to give you a general answer to that question. Instead, as a starting point for discussion, let me lay out the competence I personally am comfortable claiming, in my capacity as a trained scientist.

As someone trained in a science, I am qualified:

  1. to say an awful lot about the research projects I have completed (although perhaps a bit less about them when they were still underway).
  2. to say something about the more or less settled knowledge, and about the live debates, in my research area (assuming, of course, that I have kept up with the literature and professional meetings where discussions of research in this area take place).
  3. to say something about the more or less settled (as opposed to “frontier”) knowledge for my field more generally (again, assuming I have kept up with the literature and the meetings).
  4. perhaps, to weigh in on frontier knowledge in research areas other than my own, if I have been very diligent about keeping up with the literature and the meetings and about communicating with colleagues working in these areas.
  5. to evaluate scientific arguments in areas of science other than my own for logical structure and persuasiveness (though I must be careful to acknowledge that there may be premises of these arguments — pieces of theory or factual claims from observations or experiments that I’m not familiar with — that I’m not qualified to evaluate).
  6. to recognize, and be wary of, logical fallacies and other less obvious pseudo-scientific moves (e.g., I should call shenanigans on claims that weaknesses in theory T1 count as support for alternative theory T2).
  7. to recognize that experts in fields of science other than my own generally know what the heck they’re talking about.
  8. to trust scientists in fields other than my own to rein in scientists in those fields who don’t know what they are talking about.
  9. to face up to the reality that, as much as I may know about the little piece of the universe I’ve been studying, I don’t know everything (which is part of why it takes a really big community to do science).

This list of my qualifications is an expression of my comfort level more than anything else. It’s not elitist — good training and hard work can make a scientist out of almost anyone. But, it recognizes that with as much as there is to know, you can’t be an expert on everything. Knowing how far the tether of your expertise extends is part of being a responsible scientist.

So, what kind of help can a scientist give the public in evaluating what is presented as scientific knowledge? What kind of trouble can a scientist encounter in trying to sort out the good from the bad science for the public? Does the help scientists offer here always help?

What the chlorite-iodide reaction taught me.

Since 2011 is the International Year of Chemistry, the good folks at CENtral Science are organizing a blog carnival on the theme, “Your favorite chemical reaction”.

My favorite chemical reaction is the chlorite-iodide reaction, and it’s my favorite because of the life lessons it has taught me.

The reaction has overall stoichiometry:
ClO2 + 4 I + 4 H+ = 2 I2 + Cl + H2O
Written out that way, as a simple set of reactants and products, it doesn’t look that exciting, but when the reaction is run in a continuous flow stirred tank reactor (CSTR), where reactions are flowed in and products are removed, it can exhibit oscillatory behavior. The oscillations in the concentrations of iodine (I2) and iodide (I) can be tracked experimentally, the former by measuring UV absorbance at 460 nm, the latter by measuring the potential of an ion-specific electrode.

An early study of the kinetics of this reaction determined that it “is catalyzed by the iodine product, and the autocatalysis is inhibited by iodide ion.” (Kern and Kim 1965, 5309) In 1985, Epstein and Kustin proposed the first mechanism for this reaction to account for the oscillatory behavior, one that includes 13 elementary steps and 12 chemical species. Two years later, Citri and Epstein proposed an improved model mechanism with 8 elementary mechanistic steps and 10 chemical species. The Citri-Epstein model proposes a different set of elementary steps to describe the oxidation of iodide by chlorite. In addition, it eliminates the intermediate IClO2, “whose existence has been called into question elsewhere.” (Citri and Epstein 1987, 6035) The resulting model mechanism seemed to produce better agreement between predicted and measured concentrations of iodide and iodine than that given by the earlier model.

The chlorite-iodide reaction also happens to have been the reaction at the center of most of my research for my Ph.D. in chemistry.

Here are some of the lessons I learned working with the chlorite-iodide reaction:

  1. Experimental tractability matters, at least when you’re doing experiments. The general thrust of my research was to work out clever ways to perform empirical tests of proposed mechanisms for oscillating chemical reactions, but the chlorite-iodide reaction was not the first reaction I worked with. I started out trying to make some clever measurements on another reaction, the minimal bromate oscillator (MBO). However, after maybe six months of fighting to set up the conditions where the MBO would give me oscillations, I had to make my peace with the idea that its “small” region in phase-space with oscillatory behavior was really, really small. Luckily, in my reading of the relevant literature on the experimental and theoretical approaches we were taking, I had come across a similar inorganic chemical oscillator with an “ample” oscillatory region, one which promised to make my time in the lab exponentially less frustrating. That’s right, the chlorite-iodide reaction was my rebound system, but we stayed together and made it work.
  2. When your original research project gets stuck, it’s good to have a detailed plan for how to move forward when you talk to the boss. My advisor was really keen for that minimal bromate oscillator that was making my life in the lab a nightmare. So, when I met with him to tell him I wanted to break up with the MBO and take up with the chlorite-iodide reaction, I had to make the case for the new system. I came armed with the articles that described its substantial oscillatory region, and the articles that described the MBO’s tiny one. I prepared some calculations describing how much more precise our pump-rates would need to be to find MBO oscillations, and catalogues that listed the prices of the new equipment we would need. I brought the articles proposing mechanisms for the chlorite-iodide reaction so I could display the virtues of their elementary mechanistic steps from the point of view of the kind of experimental probing we had in mind. Because I did my homework and was able to make a persuasive case, the boss was happy to let me start working with the chlorite-iodide system right away, and to kiss the minimal bromate oscillator goodbye forever.
  3. Experimental tractability is relative, not absolute (and Materials and Methods often leave stuff out). The chlorite-iodide reaction was certainly easier to work with — within a week, I found oscillations where the literature said I would — but it was not completely smooth sailing. There were pumps that didn’t perform as they should, which meant I was taking them apart and swapping out components. There were days when I couldn’t get any reliable measurements because the pH meter I used with my iodide-specific electrode had been left on for too many hours in a row. And, there were little details I discovered in setting up experimental runs day in and day out that were not fully discussed in the “materials and methods” section of the published papers describing the chlorite-iodide reaction. Reproducibility is hard
  4. Reactions happen in three-dimensional space, not just in reaction space. One of the experimental challenges of the chlorite-iodide reaction is that, to find the dynamical behavior you’re looking for, you have to stir the reactants in the tank reactor at the right speed. Stirring much faster or much slower will change the dynamics of the reaction, as will using a reactor with significantly different internal geometry. (“Dimples” protruding into the cylindrical space inside the reactor are supposed to help you mix the reactants more effectively, rather than giving them the opportunity to hang out unmixed by the walls.) Appropriate stirring speed was not one of the parameters spelled out by the papers whose descriptions of the reaction I was using to get started, nor was reactor geometry. I had to do experiments to work out the stirring speed that (with the geometry of the reaction vessel we had on hand) produced the same behavior as these other papers were reporting. Once I found that stir-speed, I kept that constant for my experimental runs. Also, I made detailed measurements of the reactor we were using, which turned out to be a really good thing when that reactor broke. I was able to take those measurements to the glass-blower’s shop and get replacements (plural) made.
  5. Time well spent in setting things up is frequently rewarded with good data. It was absolutely worth it to spend a couple hours at the beginning of each run calibrating pump flow-rates and checking out the iodide-selective electrode performance with standard solutions, since this let me apply the experimental conditions I wanted to and make accurate measurements. Did I mention that reproducibility is hard?
  6. Qualitative measurements require patience, too. Among other things, I was interested in mapping the edges of regions in phase-space where the chlorite-iodide reaction displayed different kinds of behavior. On one edge, there was a bifurcation where you would find steady state behavior (i.e., stable concentrations of reaction species) that, coming up on the bifurcation point, became tiny-amplitude oscillations that grew. On the other edge, the oscillations had attained their maximum amplitude, but their period (that is, the lag between oscillatory peaks) grew longer and longer until there weren’t any more peaks and the reaction settled into another steady state. The thing was, it was hard to know when you were set up with conditions where the period of oscillation was just really, really long (sometimes around 20 minutes between peaks, if memory serves) or when you had found the steady state. You had to be patient. While I was exploring that edge of the reaction in phase-space, I started thinking maybe that was a good metaphor for certain aspects of graduate school.
  7. You probably can’t measure everything you’d want to measure, but sometimes measuring one more thing can help a lot. As I mentioned above, the Citri-Epstein mechanism for the chlorite-iodide reaction posited ten chemical species in the various steps of the reaction. In a perfect world, you’d want to be able to measure each of those species simultaneously over time as the reaction proceeded. But, as one learns pretty quickly in grad school, this is not a perfect world. When I started with this reaction, published papers were reporting simultaneous dynamical measurements of only two of those species (iodide and iodine). Chloride is one of the hypothesized intermediates, and there are chloride-specific electrodes on the market. However, the membrane in a chloride-specific electrode also reacts with … iodide. Other intermediate species might be measured by various chemical assays if the progress of the reaction could be halted in the samples being assayed. By the end of my graduate research, I had figured out a way to use a flow-through cuvette and a seat-of-the-pants spectral deconvolution technique to measure the time-series of one additional species in the reaction, the chlorite ion (ClO2). This was enough to do some evaluation of the proposed mechanism that was not possible without it.

Later on, when I became a philosopher of science, this work gave me some insights into the circumstances in which chemists are happy to be instrumentalists (e.g., recognizing that the fact that a proposed reaction mechanism was consistent with the observed kinetics of the reaction was no guarantee that this was the actual mechanism by which the reaction proceeded) and the circumstances in which they lean towards being realists (by finding ways to distinguish better proposed mechanisms from worse ones). But back when I was actually getting glassware dirty running the chlorite-iodide reaction, this reaction helped me learn how to be a scientist.

_____

Works cited:
Citri, Ofra, and Irving R. Epstein (1987) “Dynamical Behavior in the Chlorite-Iodide Reaction: A Simplified Mechanism”, Journal of Physical Chemistry 91: 6034-6040.

Epstein, Irving R., and Kenneth Kustin (1985) “A Mechanism for Dynamical Behavior in the Oscillatory Chlorite-Iodide Reaction”, Journal of Physical Chemistry 89: 2275-2282.

Kern, David M., and Chang-Hwan Kim (1965) “Iodine Catalysis in the Chlorite-Iodide Reaction”, Journal of the American Chemical Society 87(23): 5309-5313.

Trust me, I’m a scientist.

In an earlier post, I described an ideal of the tribe of science that the focus of scientific discourse should be squarely on the content — the hypotheses scientists are working with, the empirical data they have amassed, the experimental strategies they have developed for getting more information about our world — rather than on the particular details of the people involved in this discourse. This ideal is what sociologist of science Robert K. Merton* described as the “norm of universalism”.

Ideals, being ideals, can be hard to live up to. Anonymous peer review of scientific journal articles notwithstanding, there are conversations in the tribe of science where it seems to matter a lot who is talking, not just what she’s saying about the science. Some scientists were trained by pioneers in their fields, or hired to work in prestigious and well-funded university departments. Some have published surprising results that have set in motion major changes in the scientific understanding of a particular phenomenon, or have won Nobel Prizes.

The rest can feel like anonymous members in a sea of scientists, doing the day to day labor of advancing our knowledge without benefit of any star power within the community. Indeed, probably lots of scientists prefer the task of making the knowledge, having no special need to have their names widely known within their fields and piled with accolades.

But there’s a peculiar consequence of the idea that scientists are all in the knowledge-buiding trenches together, focused on the common task rather than on self-agrandizement. When scientists are happily ensconced in the tribe of science, very few of them take themselves to be stars. But when the larger society, made up mostly of non-scientists, encounters a scientist — any scientist — that larger society might take him to be a star.

Merton touched on this issue when he described another norm of the tribe of science, disinterestedness. One way to think about the norm of disinterestedness is that scientists aren’t doing science primarily to get the big bucks, or fame, or attractive dates. Merton’s description of this community value is a bit more subtle. He notes that disinterestedness is different from altruism, and that scientists needn’t be saints.

The best way to understand disinterestedness might be to think of how a scientist working within her tribe is different from an expert out in the world dealing with laypeople. The expert, knowing more than the layperson, could exploit the layperson’s ignorance or his tendency to trust the judgment of the expert. The expert, in other words, could put one over on the layperson for her own benefit. This is how snake oil gets sold.

The scientist working within the tribe of science can expect no such advantage. Thus, trying to put one over on other scientists is a strategy that shouldn’t get you far. By necessity, the knowledge claims you advance are going to be useful primarily in terms of what they add to the shared body of scientific knowledge, if only because your being accountable to the other scientists in the tribe means that there is no value added to the claims from using them to play your scientific peers for chumps.

Merton described situations in which the bona fides of the tribe of science were used in the service of non-scientific ends:

Science realizes its claims. However, its authority can be and is appropriated for interested purposes, precisely because the laity is often in no position to distinguish spurious from genuine claims to such authority. The presumably scientific pronouncements of totalitarian spokesmen on race or economy or history are for the uninstructed laity of the same order as newspaper reports of an expanding universe or wave mechanics. In both instances, they cannot be checked by the man-in-the-street and in both instances, they may run counter to common sense. If anything, the myths will seem more plausible and are certainly more comprehensible to the general public than accredited scientific theories, since they are closer to common-sense experience and to cultural bias. Partly as a result of scientific achievements, therefore, the population at large becomes susceptible to new mysticisms expressed in apparently scientific terms. The borrowed prestige of science bestows prestige on the unscientific doctrine. (p. 277))

(Bold emphasis added)

The success of science — the concentrated expertise of the tribe — means that those outside of it may take “scientific” claims at face value. Unable to make an independent evaluation of their credibility, lay people can easily fall prey to a wolf in scientist’s clothing, to a huckster assumed to be committed first and foremost to the facts (as scientists try to be) who is actually distorting them to look after his own ends.

This presents a serious challenge for non-scientists — and for scientists, too.

If the non-scientist can’t determine whether a purportedly scientific claim is a good one — whether, for example, it is supported by the empirical evidence — the non-scientist has to choose between accepting that claim on the authority of someone who claims to be a scientist (which in itself raises another evaluative problem for the non-scientist — what kind of credentials do you need to see from the guy wearing the lab coat to believe that he’s a proper scientist?), or setting aside all putative scientific claims and remaining agnostic about them. You trust that the “Science” label on a claim tells you something about its quality, or you recognize that it conveys even less useful information to you than a label that says, “Now with Jojoba!”

If late-night infomercials and commercial websites are any indication, there are not strong labeling laws covering what can be labeled as “Science”, at least in a sales pitch aimed at the public at large.** This leaves open the possibility that the claims made by the guy in the white lab coat that he’s saying are backed by Science would not be recognized by other scientists as backed by science.

The problem this presents for scientists is two-fold.

On the one hand, scientists are trying to get along in a larger society where some of what they discover in their day jobs (building knowledge) could end up being relevant to how that larger society makes decisions. If we want our governments to set sensible policy as far as tackling disease outbreaks, or building infrastructure that won’t crumble in floods, or ensuring that natural resources are utilized sustainably, it would be good for that policy to be informed by the best relevant knowledge we have on the subject. Policy makers, in other words, want to be able to rely on science — something that scientists want, too (since usually they are working as hard as they are to build the knowledge so that the knowledge can be put to good use). But that can be hard to do if some members of the tribe of science go rogue, trading on their scientific credibility to sell something as science that is not.

Even if policy makers have some reasonable way to tell the people slapping the Science label on claims that aren’t scientific, there will be problems in a democratic society where the public at large can’t reliably tell scientists from purveyors of snake-oil.

In such situations, the public at large may worry that anyone with scientific credentials could be playing them for suckers. Scientists who they don’t already know by reputation may be presumed to be looking out for their own interests rather than to be advancing scientific knowledge.

A public distrustful of scientists’ good intentions or trustworthiness in interactions with non-scientists will convey that distrust to the people making policy for them.

This means that scientists have a strong interest in identifying the members of the tribe of science who go rogue and try to abuse the public’s trust. People presenting themselves as scientists while selling unscientific claims are diluting the brand of Science. They undermine the reputation science has for building reliable knowledge. They undercut the claim other scientists make that, in their capacity as scientists, they hold themselves accountable to the way the world really is — to the facts, no matter how inconvenient they may be.

Indeed, if the tribe of science can’t make the case that it is serious about the task of building reliable knowledge about the world and using that knowledge to achieve good things for the public, the larger public may decide that putting up public monies to support scientific research is a bad idea. This, in turn, could lead to a world where most of the scientific knowledge is built with private money, by private industry — in which case, we might have to get most of our scientific knowledge from companies that actually are trying to sell us something.

_____
*Robert K. Merton, “The Normative Structure of Science,” in The Sociology of Science: Theoretical and Empirical Investigations. University of Chicago Press (1979), 267-278.

**There are, however, rules that require the sellers of certain kinds of products to state clearly when they are making claims that have not been evaluated by the Food and Drug administration.