How we decide (to falsify).

At the tail-end of a three-week vacation from all things online (something that I badly needed at the end of teaching an intensive five-week online course), the BBC news reader on the radio pulled me back in. I was driving my kid home from the end-of-season swim team banquet, engaged in a conversation about the awesome coaches, when my awareness was pierced by the words “Jonah Lehrer” and “resigned” and “falsified”.

It appears that the self-plagiarism brouhaha was not Jonah Lehrer’s biggest problem. On top of recycling work in ways that may not have conformed to his contractual obligations, Lehrer has also admitted to making up quotes in his recent book Imagine. Here are the details as I got them from the New York Times Media Decoder blog:

An article in Tablet magazine revealed that in his best-selling book, “Imagine: How Creativity Works,” Mr. Lehrer had fabricated quotes from Bob Dylan, one of the most closely studied musicians alive. …

In a statement released through his publisher, Mr. Lehrer apologized.

“The lies are over now,” he said. “I understand the gravity of my position. I want to apologize to everyone I have let down, especially my editors and readers.”

He added, “I will do my best to correct the record and ensure that my misquotations and mistakes are fixed. I have resigned my position as staff writer at The New Yorker.” …

Mr. Lehrer might have kept his job at The New Yorker if not for the Tablet article, by Michael C. Moynihan, a journalist who is something of an authority on Mr. Dylan.

Reading “Imagine,” Mr. Moynihan was stopped by a quote cited by Mr. Lehrer in the first chapter. “It’s a hard thing to describe,” Mr. Dylan said. “It’s just this sense that you got something to say.”

After searching for a source, Mr. Moynihan could not verify the authenticity of the quote. Pressed for an explanation, Mr. Lehrer “stonewalled, misled and, eventually, outright lied to me” over several weeks, Mr. Moynihan wrote, first claiming to have been given access by Mr. Dylan’s manager to an unreleased interview with the musician. Eventually, Mr. Lehrer confessed that he had made it up.

Mr. Moynihan also wrote that Mr. Lehrer had spliced together Dylan quotes from separate published interviews and, when the quotes were accurate, he took them well out of context. Mr. Dylan’s manager, Jeff Rosen, declined to comment.

In the practice of science, falsification is recognized as a “high crime” and is included in every official definition of scientific misconduct you’re likely to find. The reason for this is simple: scientists are committed to supporting their claims about what the various bits of the world are like and about how they work with empirical evidence from the world — so making up that “evidence” rather than going to the trouble to gather it is out of bounds.

Despite his undergraduate degree in neuroscience, Jonah Lehrer is not operating as a scientist. However, he is operating as a journalist — a science journalist at that — and journalism purports to recognize a similar kind of relationship to evidence. Presenting words as a quote from a source is making a claim that the person identified as the source actually said those things, actually made those claims or shared those insights. Presumably, a journalist includes such quotes to bolster an argument. Maybe if Jonah Lehrer had simply written a book presenting his thoughts about creativity readers would have no special reason to believe it. Supporting his views with the (purported) utterances of someone widely recognized as a creative genius, though, might make them more credible.

(Here, Eva notes drily that this incident might serve to raise Jonah Lehrer’s credibility on the subject of creativity.)

The problem, of course, is that a fake quote can’t really add credibility in the way it appears to when the quote is authentic. Indeed, once discovered as fake, it has precisely the opposite effect. As with falsification in science, falsification in journalism can only achieve its intended goal as long as its true nature remains undetected.

There is no question in my mind about the wrongness of falsification here. Rather, the question I grapple with is why do they do it?

In science, after falsified data is detected, one sometimes hears an explanation in terms of extreme pressure to meet a deadline (say, for a big grant application, or for submission of a tenure dossier) or to avoid being scooped on a discovery that is so close one can almost taste it … except for the damned experiments that have become uncooperative. Experiments can be hard, there is no denying it, and the awarding of scientific credit to the first across the finish-line (but not to the others right behind the first) raise the prospect that all of one’s hard work may be in vain if one can’t get those experiments to work first. Given the choice between getting no tangible credit for a few years’ worth of work (because someone else got her experiments to work first) and making up a few data points, a scientist might well feel tempted to cheat. That scientific communities regard falsifying data as such a serious crime is meant to reduce that temptation.

There is another element that may play an important role in falsification, one brought to my attention some years ago in a talk given by C. K. Gunsalus: the scientist may have such strong intuitions about the bit of the world she is trying to describe that gathering the empirical data to support these intuitions seems like a formality. If you’re sure you know the answer, the empirical data are only useful insofar as they help convince others who aren’t yet convinced. The problem here is that the empirical data are how we know whether our accounts of the world fit the actual world. If all we have is hunches, with no way to weed out the hunches that don’t fit with the details of reality, we’re no longer in the realm of science.

I wonder if this is close to the situation in which Jonah Lehrer found himself. Maybe he had strong intuitions about what kind of thing creativity is, and about what a creative guy like Bob Dylan would say when asked about his own exercise of creativity. Maybe these intuitions felt like a crucial part of the story he was trying to tell about creativity. Maybe he even looked to see if he could track down apt quotes from Bob Dylan expressing what seemed to him to be the obvious Dylanesque view … but, coming up short on this quotational data, he was not prepared to leave such an important intuition dangling without visible support, nor was he prepared to excise it. So he channeled Bob Dylan and wrote the thing he was sure in his heart Bob Dylan would have said.

At the time, it might have seemed a reasonable way to strengthen the narrative. As it turns out, though, it was a course of action that so weakened it that the publisher of Imagine, Houghton Mifflin Harcourt, has recalled print copies of the book.

Blogging and recycling: thoughts on the ethics of reuse.

Owing to summer-session teaching and a sprained ankle, I have been less attentive to the churn of online happenings than I usually am, but an email from SciCurious brought to my attention a recent controversy about a blogger’s “self-plagiarism” of his own earlier writing in his blog posts (and in one of his books).

SciCurious asked for my thoughts on the matter, and what follows is very close to what I emailed her in reply this morning. I should note that these thoughts were composed before I took to the Googles to look for links or to read up on the details of the particular controversy playing out. This means that I’ve spoken to what I understand as the general lay of the ethical land here, but I have probably not addressed some of the specific details that people elsewhere are discussing.

Here’s the broad question: Is it unethical for a blogger to reuse in blog posts material she has published before (including in earlier blog posts)?

A lot of people who write blogs are using them with the clear intention (clear at least to themselves) of developing ideas for “more serious” writing projects — books, or magazine articles or what have you. I myself am leaning heavily on stuff I’ve blogged over the past seven-plus years in writing the textbook I’m trying to finish, and plan similarly to draw on old blog posts for at least two other books that are in my head (if I can ever get them out of my head and into book form).

That this is an intended outcome is part of why many blog authors who are lucky enough get paying blogging gigs, especially those of us from academia, fight hard for ownership of what they post and for the explicit right to reuse what they’ve written.

So, I wouldn’t generally judge reuse of what one has written in blog posts as self-plagiarism, nor as unethical. Of course, my book(s) will explicitly acknowledge my blogs as the site-of-first-publication for earlier versions of the arguments I put forward. (My book(s) will also acknowledge the debt I owe to commenters on my posts who have pushed me to think much more carefully about the issues I’ve posted on.)

That said, if one is writing in a context where one has agreed to a rule that says, in effect, “Everything you write for us must be shiny and brand-new and never published by you before elsewhere in any form,” then one is obligated not to recycle what one has written elsewhere. That’s what it means to agree to a rule. If you think it’s a bad rule, you shouldn’t agree to it — and indeed, perhaps you should mount a reasoned argument as to why it’s a bad rule. Agreeing to follow the rule and then not following the rule, however, is unethical.

There are venues (including the Scientific American Blog Network) that are OK with bloggers of long standing brushing off posts from the archives. I’ve exercised this option more than once, though I usually make an effort to significantly update, expand, or otherwise revise those posts I recycle (if for no other reason than I don’t always fully agree with what that earlier time-slice of myself wrote).

This kind of reuse is OK with my corporate master. Does that necessarily make it ethical?

Potentially it would be unethical if it imposed a harm on my readers — that is, if they (you) were harmed by my reposting those posts of yore. But, I think that would require either that I had some sort of contract (express or implied) with my readers that I only post thoughts I have never posted before, or that my reposts mislead them about what I actually believe at the moment I hit the “publish” button. I don’t have such a contract with my readers (at least, I don’t think I do), and my revision of the posts I recycle is intended to make sure that they don’t mislead readers about what I believe.

Back-linking to the original post is probably good practice (from the point of view of making reuse transparent) … but I don’t always do this.

One reason is that the substantial revisions make the new posts substantially different — making different claims, coming to different conclusions, offering different reasons. The old post is an ancestor, but it’s not the same creature anymore.

Another reason is that some of the original posts I’m recycling are from my ancient Blogspot blog, from whose backend I am locked out after a recent Google update/migration — and I fear that the blog itself may disappear, which would leave my updated posts with back-links to nowhere. Bloggers tend to view back-links to nowhere as a very bad thing.

The whole question of “self-plagiarism” as an ethical problem is an interesting one, since I think there’s a relevant difference between self-plagiarism and ethical reuse.

Plagiarism, after all, is use of someone else’s words or ideas (or data, or source-code, etc.) without proper attribution. If you’re reusing your own words or ideas (or whatnot), it’s not like you’re misrepresenting them as your own when they’re really someone else’s.

There are instances, however, where self-reuse presents gets people rightly exercised. For example, some scientists reuse their own stuff to create the appearance in the scientific literature that they’ve conducted more experimental studies than they actually have, or that there are more published results supporting their hypotheses than there really are. This kind of artificial multiplication of scientific studies is ethically problematic because it is intended to mislead (and indeed, may succeed in misleading), not because the scientists involved haven’t given fair credit to the earlier time-slices of themselves. (A recent editorial for ACS Nano gives a nice discussion of other problematic aspects of “self-plagiarism” within the context of scientific publishing.)

The right ethical diagnosis of the controversy du jour may depend in part on whether journalistic ethics forbid reuse (explicitly or implicitly) — and if so, on whether (or in what conditions) bloggers count as journalists. At some level, this goes beyond what is spelled out in one’s blogging contract and turns also on the relationship between the blogger and the reader. What kind of expectations can the reader have of the blogger? What kind of expectations ought the reader to have of the blogger? To the extent that blogging is a conversation of a sort (especially when commenting is enabled), is it appropriate for that conversation to loop back to territory visited before, or is the blogger obligated always to break new ground?

And, if the readers are harmed when the blogger recycles her own back-catalogue, what exactly is the nature of that harm?

Evaluating scientific claims (or, do we have to take the scientist’s word for it?)

Recently, we’ve noted that a public composed mostly of non-scientists may find itself asked to trust scientists, in large part because members of that public are not usually in a position to make all their own scientific knowledge. This is not a problem unique to non-scientists, though — once scientists reach the end of the tether of their expertise, they end up having to approach the knowledge claims of scientists in other fields with some mixture of trust and skepticism. (It’s reasonable to ask what the right mixture of trust and skepticism would be in particular circumstances, but there’s not a handy formula with which to calculate this.)

Are we in a position where, outside our own narrow area of expertise, we either have to commit to agnosticism or take someone else’s word for things? If we’re not able to directly evaluate the data, does that mean we have no good way to evaluate the credibility of the scientist pointing to the data to make a claim?

This raises an interesting question for science journalism, not so much about what role it should play as what role it could play.

If only a trained scientist could evaluate the credibility of scientific claims (and then perhaps only in the particular scientific field in which one was trained), this might reduce science journalism to a mere matter of publishing press releases, or of reporting on scientists’ social events, sense of style, and the like. Alternatively, if the public looked to science journalists not just to communicate the knowledge claims various scientists are putting forward but also to do some evaluative work on our behalf — sorting out credible claims and credible scientists from the crowd — we might imagine that good science journalism demands extensive scientific training (and that we probably need a separate science reporter for each specialized area of science to be covered).

In an era where media outlets are more likely to cut the science desk than expand it, pinning our hopes on legions of science-Ph.D.-earning reporters on the science beat might be a bad idea.

I don’t think our prospects for evaluating scientific credibility are quite that bad.

Scientific knowledge is built on empirical data, and the details of the data (what sort of data is relevant to the question at hand, what kind of data can we actually collect, what techniques are better or worse for collecting the data, how we distinguish data from noise, etc.) can vary quite a lot in different scientific disciplines, and in different areas of research within those disciplines. However, there are commonalities in the basic patterns of reasoning that scientists in all fields use to compare their theories with their data. Some of these patterns of reasoning may be rather sophisticated, perhaps even non-intuitive. (I’m guessing certain kinds of probabilistic or statistical reasoning might fit this category.) But others will be the patterns of reasoning that get highlighted when “the scientific method” is taught.

In other words, even if I can’t evaluate someone else’s raw data to tell you directly what it means, I can evaluate the way that data is used to support or refute claims. I can recognize logical fallacies and distinguish them from instances of valid reasoning. Moreover, this is the kind of thing that a non-scientist who is good at critical thinking (whether a journalist or a member of the public consuming a news story) could evaluate as well.

One way to judge scientific credibility (or lack thereof) is to scope out the logical structure of the arguments a scientist is putting up for consideration. It is possible to judge whether arguments have the right kind of relationship to the empirical data without wallowing in that data oneself. Credible scientists can lay out:

  • Here’s my hypothesis.
  • Here’s what you’d expect to observe if the hypothesis is true. Here, on the other hand, is what you’d expect to observe if the hypothesis is false.
  • Here’s what we actually observed (and here are the steps we took to control the other variables).
  • Here’s what we can say (and with what degree of certainty) about the hypothesis in the light of these results.
  • Here’s the next study we’d like to do to be even more sure.

And, not only will the logical connections between the data and what is inferred from them look plausible to the science writer who is hip to the scientific method, but they ought to look plausible to other scientists — even to scientists who might prefer different hypotheses, or different experimental approaches. If what makes something good science is its epistemology — the process by which data are used to generate and/or support knowledge claims — then even scientists who may disagree with those knowledge claims should still be able to recognize the patterns of reasoning involved as properly scientific. This suggests a couple more things we might ask credible scientists to display:

  • Here are the results of which we’re aware (published and unpublished) that might undermine our findings.
  • Here’s how we have taken their criticisms (or implied criticisms) seriously in evaluating our own results.

If the patterns of reasoning are properly scientific, why wouldn’t all the scientists agree about the knowledge claims themselves? Perhaps they’re taking different sets of data into account, or they disagree about certain of the assumptions made in framing the question. The important thing to notice here is that scientists can disagree with each other about experimental results and scientific conclusions without thinking that the other guy is a bad scientist. The hope is that, in the fullness of time, more data and dialogue will resolve the disagreements. But good, smart, honest scientists can disagree.

This is not to say that there aren’t folks in lab coats whose thinking is sloppy. Indeed, catching sloppy thinking is the kind of thing you’d hope a good general understanding of science would help someone (like a scientific colleague, or a science journalist) to do. At that point, of course, it’s good to have backup — other scientists who can give you their read on the pattern of reasoning, for example. And, to the extent that a scientist — especially one talking “on the record” about the science (whether to a reporter or to other scientists or to scientifically literate members of the public) — displays sloppy thinking, that would tend to undermine his or her credibility.

There are other kinds of evaluation you can probably make of a scientist’s credibility without being an expert in his or her field. Examining a scientific paper to see if the sources cited make the claims that they are purported to make by the paper citing them is one way to assess credibility. Determining whether a scientist might be biased by an employer or a funding source may be harder. But there, I suspect many of the scientists themselves are aware of these concerns and will go the extra mile to establish their credibility by taking the possibility that they are seeing what they want to see very seriously and testing their hypotheses fairly stringently so they can answer possible objections.

It’s harder still to get a good read on the credibility of scientists who present evidence and interpretations with the right sort of logical structure but who have, in fact, fabricated or falsified that evidence. Being wary of results that seem too good to be true is probably a good strategy here. Also, once a scientist is caught in such misconduct, it’s entirely appropriate not to trust another word that comes from his or her mouth.

One of the things fans of science have tended to like is that it’s a route to knowledge that is, at least potentially, open to any of us. It draws on empirical data we can get at through our senses and on our powers of rational thinking. As it happens, the empirical data have gotten pretty complicated, and there’s usually a good bit of technology between the thing in the world we’re trying to observe and the sense organs we’re using to observe it. However, those powers of rational thinking are still at the center of how the scientific knowledge gets built. Those powers need careful cultivation, but to at least a first approximation they may be enough to help us tell the people doing good science from the cranks.