Pennywise and pound-foolish: misidentified cells and competitive pressures in scientific knowledge-building.

The overarching project of science is building reliable knowledge about the world, but the way this knowledge-building happens in our world is in the context of competition. For example, scientists compete with each other to be the first to make a new discovery, and they compete with each other for finite pools of grant money with which to conduct more research and make further discoveries.

I’ve heard the competitive pressures on scientists described as a useful way to motivate scientists to be clever and efficient (and not to knock off early lest some more dedicated lab get to your discovery first). But there are situations where it’s less obvious that fierce competition for scarce resources leads to choices that really align with the goal of building reliable knowledge about the world.

This week, on NPR’s Morning Edition, Richard Harris reported a pair of stories on how researchers who work with cells in culture grapple with the problem of their intended cell line being contaminated and overtaken by a different cell line. Harris tells us:

One of the worst cases involves a breast cancer cell line called MDA-435 (or MDA-MB-435). After the cell line was identified in 1976, breast cancer scientists eagerly adopted it.

When injected in animals, the cells spread the way breast cancer metastasizes in women, “and that’s not a very common feature of most breast cancer cell lines,” says Stephen Ethier, a cancer geneticist at the Medical University of South Carolina. “So as a result of that, people began asking for those cells, and so there are many laboratories all over the world, who have published hundreds of papers using the MDA-435 cell line as a model for breast cancer metastasis.”

In fact, scientists published more than a thousand papers with this cell line over the years. About 15 years ago, scientists using newly developed DNA tests took a close look at these cells. And they were shocked to discover that they weren’t from a breast cancer cell at all. The breast cancer cell line had been crowded out by skin cancer cells.

“We now know with certainty that the MDA-435 cell line is identical to a melanoma cell line,” Ethier says.

And it turns out that contamination traces back for decades. Several scientists published papers about this to alert the field, “but nevertheless, there are people out there who haven’t gotten the memo, apparently,” he says.

Decades worth of work and more than a thousand published research papers were supposed to add up to a lot of knowledge about a particular kind of breast cancer cell, except it wasn’t knowledge about breast cancer cells at all because the cells in the cell line had been misidentified. Probably scientists know something from that work, but it isn’t the knowledge they thought they had before the contamination was detected.

On the basis of the discovery that this much knowledge-building had been compromised by being based on misidentified cells, you might imagine researchers would prioritize precise identification of the cells they use. But, as Harris found, this obvious bit of quality control meets resistance. For one thing, researchers seem unwilling to pay the extra financial costs it would take:

This may all come down to money. Scientists can avoid most of these problems by purchasing cells from a company that routinely tests them. But most scientists would rather walk down the hall and borrow cells from another lab.

“Academics share their cell lines like candy because they don’t want to go back and spend another $300,” said Richard Neve from Genentech. “It is economics. And they don’t want to spend another $100 to [verify] that’s still the same cell line.”

Note here that scientists could still economize by sharing cell lines with their colleagues instead of purchasing them but paying for the tests to nail down the identity of the shared cells. However, many do not.

(Consider, though, how awkward it might be to test cells you’ve gotten from a colleague only to discover that they are not the kind of cells your colleague thought they were. How do you break the news to your colleague that their work — including published papers in scientific journals — is likely to be mistaken and misleading? How likely would this make other colleagues to share their cell lines with you, knowing that you might bring them similarly bad news as a result of their generosity?)

Journals like Nature have tried to encourage scientists to test their cell lines by adding it to an authors’ checklist for researchers submitting papers. Most authors do not check the box indicating they have tested their cells.

One result here is that the knowledge that comes from these studies and gets reported in scientific journals may not be as solid as it seems:

When scientists at [Genentech] find an intriguing result from an academic lab, the first thing they do is try to replicate the result.

Neve said often they can’t, and misidentified cells are a common reason.

This is a problem that is not just of concern to scientists. The rest of us depend on scientists to build reliable knowledge about the world in part because it might matter for what kinds of treatments are developed for diseases that affect us. Moreover, much of this research is paid for with public money — which means the public has an interest in whether the funding is doing what it is supposed to be doing.

However, Harris notes that funding agencies seem unwilling to act decisively to address the issue of research based on misidentified cell lines:

“We are fully convinced that this is a significant enough problem that we have to take steps to address it,” Jon Lorsch, director of the NIH’s National Institute of General Medical Sciences, said during the panel discussion.

One obvious step would be to require scientists who get federal funding to test their cells. Howard Soule, chief science officer at the Prostate Cancer Foundation, said that’s what his charity requires of the scientists it funds.

There’s a commercial lab that will run this test for about $140, so “this is not going to break the bank,” Soule said.

But Lorsch at the NIH argued that it’s not so simple on the scale at which his institute hands out funding. “We really can’t go and police 10,000 grants,” Lorsch said.

“Sure you can,” Soule shot back. “How can you not?”

Lorsch said if they do police this issue, “there are dozens and dozens of other issues” that the NIH should logically police as well. “It becomes a Hydra,” Lorsch said. “You know, you chop off one head and others grow.”

Biomedical research gets more expensive all the time, and the NIH is reluctant to pile on a whole bunch of new rules. It’s a balancing act.

“If we become too draconian we’re going to end up squashing creativity and slowing down research, which is not good for the taxpayers because they aren’t going to get as much for their money,” Lorsch said.

To my eye, Lorsch’s argument against requiring researchers to test their cells focuses on the competitive aspect of scientific research to the exclusion of the knowledge-building aspect.

What does it matter if the taxpayers get more research generated and published if a significant amount of that research output is irreproducible because of misidentified cells? In the absence of tests to properly identify the cells being used, there’s no clear way to tell just by looking at the journal articles which ones are reliable and which ones are not. Post-publication quality control requires researchers to repeat experiments and compare their results to those published, something that will cost significantly more than if the initial researchers tested their cells in the first place.

However, research funding is generally awarded to build new knowledge, not to test existing knowledge claims. Scientists get credit for making new discoveries, not for determining that other scientists’ discoveries can be reproduced.

NIH could make it a condition of funding that researchers working with cell lines get those cell lines tested, and arguably this would be the most cost-efficient way to ensure results that are reliable rather than based on misidentification. I find Lorsch’s claim that there are dozens of other kinds of quality control of this sort NIH could demand, so they cannot demand this, unpersuasive. Even if there are many things to fix, it doesn’t mean you must fix them all at once. Incremental improvements in quality control are surely better than none at all.

His further suggestion that engaging in NIH-mandated quality control will quash scientific creativity strikes me as silly. Scientists are at their most creative when they are working within constraints to solve problems. Indeed, were NIH to require that researchers test their cells, there is no reason to think this additional constraint could not be easily incorporated into researchers’ current competition for NIH funding.

The big question, really, is whether NIH is prioritizing funding a higher volume of research, or higher quality research. Presumably, the public is better served by a smaller number of published studies that make reliable claims about the actual cells researchers are working with than by a large number of published studies making hard-to-verify claims about misidentified cells.

If scientific competition is inescapable, at least let’s make sure that the incentives encourage the careful steps required to build reliable knowledge. If those careful steps are widely seen as an impediment in succeeding in the competition, we derail the goal that the competitive pressures were supposed to enhance.

Grappling with the angry-making history of human subjects research, because we need to.

Teaching about the history of scientific research with human subjects bums me out.

Indeed, I get fairly regular indications from students in my “Ethics in Science” course that reading about and discussing the Nazi medical experiments and the U.S. Public Health Service’s Tuskegee syphilis experiment leaves them feeling grumpy, too.

Their grumpiness varies a bit depending on how they see themselves in relation to the researchers whose ethical transgressions are being inspected. Some of the science majors who identify strongly with the research community seem to get a little defensive, pressing me to see if these two big awful examples of human subject research aren’t clear anomalies, the work of obvious monsters. (This is one reason I generally point out that, when it comes to historical examples of ethically problematic research with human subjects, the bench is deep: the U.S. government’s syphilis experiments in Guatemala, the MIT Radioactivity Center’s studies on kids with mental disabilities in a residential school, the harms done to Henrietta Lacks and to the family members that survived her by scientists working with HeLa cells, the National Cancer Institute and Gates Foundation funded studies of cervical cancer screening in India — to name just a few.) Some of the non-science majors in the class seem to look at their classmates who are science majors with a bit of suspicion.

Although I’ve been covering this material with my students since Spring of 2003, it was only a few years ago that I noticed that there was a strong correlation between my really bad mood and the point in the semester when we were covering the history of human subjects research. Indeed, I’ve come to realize that this is no mere correlation but a causal connection.

The harm that researchers have done to human subjects in order to build scientific knowledge in many of these historically notable cases makes me deeply unhappy. These cases involve scientists losing their ethical bearings and then defending indefensible actions as having been all in the service of science. It leaves me grumpy about the scientific community of which these researchers were a part (rather than being obviously marked as monsters or rogues). It leaves me grumpy about humanity.

In other contexts, my grumpiness might be no big deal to anyone but me. But in the context of my “Ethics in Science” course, I need to keep pessimism on a short leash. It’s kind of pointless to talk about what we ought to do if you’re feeling like people are going to be as evil as they can get away with being.

It’s important to talk about the Nazi doctors and the Tuskegee syphilis experiment so my students can see where formal statements about ethical constraints on human subject research (in particular, the Nuremberg Code and the Belmont Report) come from, what actual (rather than imagined) harms they are reactions to. To the extent that official rules and regulations are driven by very bad situations that the scientific community or the larger human community want to avoid repeating, history matters.

History also matters if scientists want to understand the attitudes of publics towards scientists in general and towards scientists conducting research with human subjects in particular. Newly-minted researchers who would never even dream of crossing the ethical lines the Nazi doctors or the Tuskegee syphilis researchers crossed may feel it deeply unfair that potential human subjects don’t default to trusting them. But that’s not how trust works. Ignoring the history of human subjects research means ignoring very real harms and violations of trust that have not faded from the collective memories of the populations that were harmed. Insisting that it’s not fair doesn’t magically earn scientists trust.

Grappling with that history, though, might help scientists repair trust and ensure that the research they conduct is actually worthy of trust.

It’s history that lets us start noticing patterns in the instances where human subjects research took a turn for the unethical. Frequently we see researchers working with human subjects who that don’t see as fully human, or whose humanity seems less important than the piece of knowledge the researchers have decided to build. Or we see researchers who believe they are approaching questions “from the standpoint of pure science,” overestimating their own objectivity and good judgment.

This kind of behavior does not endear scientists to publics. Nor does it help researchers develop appropriate epistemic humility, a recognition that their objectivity is not an individual trait but rather a collective achievement of scientists engaging seriously with each other as they engage with the world they are trying to know. Nor does it help them build empathy.

I teach about the history of human subjects research because it is important to understand where the distrust between scientists and publics has come from. I teach about this history because it is crucial to understanding where current rules and regulations come from.

I teach about this history because I fully believe that scientists can — and must — do better.

And, because the ethical failings of past human subject research were hardly ever the fault of monsters, we ought to grapple with this history so we can identify the places where individual human weakness, biases, blind-spots are likely to lead to ethical problems down the road. We need to build systems and social mechanisms to be accountable to human subjects (and to publics), to prioritize their interests, never to lose sight of their humanity.

We can — and must — do better. But this requires that we seriously examine the ways that scientists have fallen short — even the ways that they have done evil. We owe it to future human subjects of research to learn from the ways scientists have failed past human subjects, to apply these lessons, to build something better.

Some thoughts about human subjects research in the wake of Facebook’s massive experiment.

You can read the study itself here, plus a very comprehensive discussion of reactions to the study here.

1. If you intend to publish your research in a peer-reviewed scientific journal, you are expected to have conducted that research with the appropriate ethical oversight. Indeed, the submission process usually involves explicitly affirming that you have done so (and providing documentation, in the case of human subjects research, of approval by the relevant Institutional Review Board(s) or of the IRB’s determination that the research was exempt from IRB oversight).

2. Your judgment, as a researcher, that your research will not expose your human subjects to especially big harms does not suffice to exempt that research from IRB oversight. The best way to establish that your research is exempt from IRB oversight is to submit your protocol to the IRB and have the IRB determine that it is exempt.

3. It’s not unreasonable for people to judge that violating their informed consent (say, by not letting them know that they are human subjects in a study where you are manipulating their environment and not giving them the opportunity to opt out of being part of your study) is itself a harm to them. When we value our autonomy, we tend to get cranky when others disregard it.

4. Researchers, IRBs, and the general public needn’t judge a study to be as bad as [fill in the name of a particularly horrific instance of human subjects research] to judge the conduct of the researchers in the study unethical. We can (and should) surely ask for more than “not as bad as the Tuskegee Syphilis Experiment”.

5. IRB approval of a study means that the research has received ethical oversight, but it does not guarantee that the treatment of human subjects in the research will be ethical. IRBs can make questionable ethical judgments too.

6. It is unreasonable to suggest that you can generally substitute Terms of Service or End User License Agreements for informed consent documents, as the latter are supposed to be clear and understandable to your prospective human subjects, while the former are written in such a way that even lawyers have a hard time reading and understanding them. The TOS or EULA is clearly designed to protect the company, not the user. (Some of those users, by the way, are in their early teens, which means they probably ought to be regarded as members of a “vulnerable population” entitled to more protection, not less.)

7. Just because a company like Facebook may “routinely” engage in manipulations of a user’s environment doesn’t make that kind of manipulation automatically ethical when it is done for the purposes of research. Nor does it mean that that kind of manipulation is ethical when Facebook does it for its own purposes. As it happens, peer-reviewed scientific journals, funding agencies, and other social structures tend to hold scientists building knowledge with human subjects research to higher ethical standards than (say) corporations are held to when they interact with humans. This doesn’t necessarily means our ethical demands of scientific knowledge-builders are too high. Instead, it may mean that our ethical demands of corporations are too low.

How to be ethical while getting the public involved in your science

At ScienceOnline Together later this week, Holly Menninger will be moderating a session on “Ethics, Genomics, and Public Involvement in Science”.

Because the ethical (and epistemic) dimensions of “citizen science” have been on my mind for a while now, in this post I share some very broad, pre-conference thoughts on the subject.

Ethics is a question of how we share a world with each other. Some of this is straightforward and short-term, but sometimes engaging each other ethically means taking account of long-range consequences, including possible consequences that may be difficult to foresee unless we really work to think through the possibilities ahead of time — and unless this thinking through of possibilities is informed by knowledge of some of the technologies involved and of history of what kinds of unforeseen outcomes have led to ethical problems before.

Ethics is more than merely meeting your current legal and regulatory requirements. Anyone taking that kind of minimalist approach to ethics is gunning to be a case study in an applied ethics class (probably within mere weeks of becoming a headline in a major news outlet).

With that said, if you’re running a project you’d describe as “citizen science” or as cultivating public involvement in science, here are some big questions I think you should be asking from the start:

1. What’s in it for the scientists?

Why are you involving members of the public in your project?

Are they in the field collecting observations that you wouldn’t have otherwise, or on their smart phones categorizing the mountains of data you’ve already collected? In these cases, the non-experts are providing labor you need for vital non-automatable tasks.

Are they sending in their biological samples (saliva, cheek swab, belly button swab, etc.)? In these cases, the non-experts are serving as human subjects, expanding the pool of samples in your study.

In both of these cases, scientists have ethical obligations to the non-scientists they are involving in their projects, although the ethical obligations are likely to be importantly different. In any case where a project involves humans as sources of biological samples, researchers ought to be consulting an Institutional Review Board, at least informally, before the project is initiated (which includes the start of anything that looks like advertising for volunteers who will provide their samples).

If volunteers are providing survey responses or interviews instead of vials of spit, there’s a chance they’re still acting as human subjects. Consult an IRB in the planning stages to be sure. (If your project is properly exempt from IRB oversight, there’s no better way to show it than an exemption letter from an IRB.)

If volunteers are providing biological samples from their pets or reports of observations of animals in the field (especially in fragile habitats), researchers ought to be consulting an Institutional Animal Care and Use Committee, at least informally, before the project is initiated. Again, it’s possible that what you’ll discover in this consultation is that the proposed research is exempt from IACUC oversight, but you want a letter from an IACUC to that effect.

Note that IRBs and IACUCs don’t exist primarily to make researchers’ lives hard! Rather, they exist to help researchers identify their ethical obligations to the humans and animals who serve as subjects of their studies, and to help find ways to conduct that research in ways that honor those obligations. A big reason to involve committees in thinking through the ethical dimensions of the research is that it’s hard for researchers to be objective in thinking through these questions about their own projects.

If you’re involving non-experts in your project in some other way, what are they contributing to the project? Are you involving them so you can check off the “broader impacts” box on your grant application, or is there some concrete way that involving members of the public is contributing to your knowledge-building? If the latter, think hard about what kinds of obligations might flow from that contribution.

2. What’s in it for the non-scientists/non-experts/members of the public involved in the project?

Why would members of the public want to participate in your project? What could they expect to get from such participation?

Maybe they enjoy being outdoors counting birds (and would be doing so even if they weren’t participating in the project), or looking at pictures of galaxies from space telescopes. Maybe they are curious about what’s in their genome or what’s in their belly-button. Maybe they want to help scientists build new knowledge enough to participate in some of the grunt-work required for that knowledge-building. Maybe they want to understand how that grunt-work fits into the knowledge-building scientists do.

It’s important to understand what the folks whose help you’re enlisting think they’re signing on for. Otherwise, they may be expecting something from the experience that you can’t give them. The best way to find out what potential participants are looking for from the experience is to ask them.

Don’t offer potential diagnostic benefits from participation in a project for which that information is a long, long way off. Don’t promise that tracking the health of streams by screening for the presence of different kinds of bugs will be tons of fun without being clear about the conditions your volunteers will undergo to perform those screenings.

Don’t promise participants that they will be getting a feel for what it’s like to “do science” if, in fact, they are really just providing a sample rather than being part of the analysis or interpretation of that sample.

Don’t promise them that they will be involved in hypothesis-formation or conclusion-drawing if really you are treating them as fancy measuring devices.

3. What’s the relationship between the scientists and the non-scientists in this project? What consequences will this have for relationships between scientists and the pubic more generally?

There’s a big difference in involving members of the public in your project because it will be enriching for them personally and involving them in your project because it’s the only conceivable way to build a particular piece of knowledge you’re trying to build.

Being clear about the relationship upfront — here’s why we need you, here’s what you can expect in return (both the potential benefits of participation and the potential risks) — is the best way to make sure everyone’s interests are well-served by the partnership and that no one is being deceived.

Things can get complicated, though, when you pull the focus back from how participants are involved in building the knowledge and consider how that knowledge might be used.

Will the new knowledge primarily benefit the scientists leading the project, adding publications to their CVs and helping them make the case for funding for further projects? Could the new knowledge contribute to our understanding (of ecosystems, or human health, for example) in ways that will drive useful interventions? Will those interventions be driven by policy-makers or commercial interests? Will the scientists be a part of this discussion of how the knowledge gets used? Will the members of the public (either those who participated in the project or members of the public more generally) be a part of this discussion — and will their views be taken seriously?

To the extent that participating in citizen science project, whatever shape that participation may take, can influence non-scientists’ views on science and the scientific community as a whole, the interactions between scientists and volunteers in and around these projects are hugely important. They are an opportunity for people with different interests, different levels of expertise, different values, to find common ground while working together to achieve a shared goal — to communicate honestly, deal with each other fairly, and take each other seriously.

More such ethical engagement between scientists and publics would be a good thing.

But the flip-side is that engagements between scientists and publics that aren’t as honest or respectful as they should be may have serious negative impacts beyond the particular participants in a given citizen science project. They may make healthy engagement, trust, and accountability harder for scientists and publics across the board.

In other words, working hard to do it right is pretty important.

I may have more to say about this after the conference. In the meantime, you can add your questions or comments to the session discussion forum.

Ethical and practical issues for uBiome to keep working on.

Earlier this week, the Scientific American Guest Blog hosted a post by Jessica Richman and Zachary Apte, two members of the team at uBiome, a crowdfunded citizen science start-up. Back in February, as uBiome was in the middle of its crowdfunding drive, a number of bloggers (including me) voiced worries that some of the ethical issues of the uBiome project might require more serious attention. Partly in response to those critiques, Richman’s and Apte’s post talks about their perspectives on Institutional Review Boards (IRBs) and how in their present configuration they seem suboptimal for commercial citizen science initiatives.

Their post provides food for thought, but there are some broader issues about which I think the uBiome team should think a little harder.

Ethics takes more than simply meeting legal requirements.

Consulting with lawyers to ensure that your project isn’t breaking any laws is a good idea, but it’s not enough. Meeting legal requirements is not sufficient to meet your ethical obligations (which are well and truly obligations even when they lack the force of law).

Now, it’s the case that there is often something like the force of law deployed to encourage researchers (among others) not to ignore their ethical obligations. If you accept federal research funds, for example, you are entering into a contract one of whose conditions is forking within federal guidelines for ethical use of animal or human subjects. If you don’t want the government to enforce this agreement, you can certainly opt out of taking the federal funds.

However, opting out of federal funding does not remove your ethical duties to animals or human subjects. It may remove the government’s involvement in making you live up to your ethical obligations, but the ethical obligations are still there.

This is a tremendously important point — especially in light of a long history of human subjects research in which researchers have often not even recognized their ethical obligations to human subjects, let alone had a good plan for living up to them.

Here, it is important to seek good ethical advice (as distinct from legal advice), from an array of ethicists, including some who see potential problems with your plans. If none of the ethicists you consult see anything to worry about, you probably need to ask a few more! Take the potential problems they identify seriously. Think through ways to manage the project to avoid those problems. Figure out a way to make things right if a worst case scenario should play out.

In a lot of ways, problems that uBiome encountered with the reception of its plan seemed to flow from a lack of good — and challenging — ethical advice. There are plenty of other people and organizations doing citizen science projects that are similar enough to uBiome (from the point of view of interactions with potential subjects/participants), and many of these have experience working with IRBs. Finding them and asking for their guidance could have helped the uBiome team foresee some of the issues with which they’re dealing now, somewhat late in the game.

There are more detailed discussions of the chasm between what satisfies the law and what’s ethical at The Broken Spoke and Drugmonkey. You should, as they say, click through and read the whole thing.

Some frustrations with IRBs may be based on a misunderstanding of how they work.

An Institutional Review Board, or IRB, is a body that examines scientific protocols to determine whether they meet ethical requirements in their engagement of human subjects (including humans who provide tissue or other material to a study). The requirement for independent ethical evaluation of experimental protocols was first articulated in the World Medical Association’s Declaration of Helsinki, which states:

The research protocol must be submitted for consideration, comment, guidance and approval to a research ethics committee before the study begins. This committee must be independent of the researcher, the sponsor and any other undue influence. It must take into consideration the laws and regulations of the country or countries in which the research is to be performed as well as applicable international norms and standards but these must not be allowed to reduce or eliminate any of the protections for research subjects set forth in this Declaration. The committee must have the right to monitor ongoing studies. The researcher must provide monitoring information to the committee, especially information about any serious adverse events. No change to the protocol may be made without consideration and approval by the committee.

(Bold emphasis added.)

In their guest post, Richman and Apte assert, “IRBs are usually associated with an academic institution, and are provided free of charge to members of that institution.”

It may appear that the services of an IRB are “free” to those affiliated with the institution, but they aren’t really. Surely it costs the institution money to run the IRB — to hire a coordinator, to provide ethics training resources for IRB members and to faculty, staff, and students involved in human subjects research, to (ideally) give release time to faculty and staff on the IRB so they can actually devote the time required to consider protocols, comment upon them, provide guidance to PIs, and so forth.

Administrative costs are part of institutional overhead, and there’s a reasonable expectation that researchers whose protocols come before the IRB will take a turn serving on the IRB at some point. So IRBs most certainly aren’t free.

Now, given that the uBiome team was told they couldn’t seek approval from the IRBs at any institutions where they plausibly could claim an affiliation, and given the expense of seeking approval from a private-sector IRB, I can understand why they might have been hesitant to put money down for IRB approval up front. They started with no money for their proposed project. If the project itself ended up being a no-go due to insufficient funding, spending money on IRB approval would seem pointless.

However, it’s worth making it clear that expense is not in itself a sufficient reason to do without ethical oversight. IRB oversight costs money (even in an academic institution where those costs are invisible to PIs because they’re bundled into institutional overhead). Research in general costs money. If you can’t swing the costs (including those of proper ethical oversight), you can’t do the research. That’s how it goes.

Richman and Apte go on:

[W]e wanted to go even further, and get IRB approval once we were funded — in case we wanted to publish, and to ensure that our customers were well-informed of the risks and benefits of participation. It seemed the right thing to do.

So, we decided to wait until after crowdfunding and, if the project was successful, submit for IRB approval at that point.

Getting IRB approval at some point in the process is better than getting none at all. However, some of the worries people (including me) were expressing while uBiome was at the crowdfunding stage of the process (before IRB approval) were focused on how the lines between citizen scientist, human subject, and customer were getting blurred.

Did donors to the drive believe that, by virtue of their donations, they were guaranteed to be enrolled in the study (as sample providers)? Did they have a reasonable picture of the potential benefits of their participation? Did they have a reasonable picture of the potential risks of their participation?

These are not questions we leave to PIs. To assess them objectively, we put these questions before a neutral third-party … the IRB.

If the expense of formal IRB consideration of the uBiome protocol was prohibitive during the crowdfunding stage, it surely would have gone some way to meeting ethical duties if the uBiome team had vetted the language in their crowdfunding drive with independent folks attentive to human subjects protection issues. That the ethical questions raised by their fundraising drive were so glaringly obvious to so many of us suggests that skipping this step was not a good call.


We next arrive at the issue of the for-profit IRB. Richman and Apte write:

Some might criticize the fact that we are using a private firm, one not connected with a prestigious academic institution. We beg to differ. This is the same institution that works with academic IRBs that need to coordinate multi-site studies, as well as private firms such as 23andme and pharmaceutical companies doing clinical trials. We agree that it’s kind of weird to pay for ethical review, but that is the current system, and the only option available to us.

I don’t think paying for IRB review is the ethical issue. If one were paying for IRB approval, that would be an ethical issue, and there are some well known rubber-stamp-y private IRBs out there.

Carl Elliott details some of the pitfalls of the for-profit IRB in his book White Coat, Black Hat. The most obvious of these is that, in a competition for clients, a for-profit IRB might well feel a pressure to forego asking the hard questions, to be less ethically rigorous (and more rubber-stamp-y) — else clients seeking approval would take their business to a competing IRB they saw as more likely to grant that approval with less hassle.

Market forces may provide good solutions to some problems, but it’s not clear that the problem of how to make research more ethical is one of them. Also, it’s worth noting that being a citizen science project does not in and of itself preclude review by an academic IRB – plenty of citizen science projects run by academic scientists do just that. It’s uBiome’s status as a private-sector citizen science project that led to the need to find another IRB.

That said, if folks with concerns knew which private IRB the uBiome team used (something they don’t disclose in their guest post), those folks could inspect the IRB’s track record for rigor and make a judgment from that.

Richman and Apte cite as further problems with IRBs, at least as currently constituted, lack of uniformity across committees and lack of transparency. The lack of uniformity is by design, the thought being that local control of committees should make them more responsive to local concerns (including those of potential subjects). Indeed, when research is conducted by collaborators from multiple institutions, one of the marks of good ethical design is when different local IRBs are comfortable approving the protocol. As well, at least part of the lack of transparency is aimed at human subjects protection — for example, ensuring that the privacy of human subjects is not compromised in the release of approved research protocols.

This is not to say that there is no reasonable discussion to have about striving for more IRB transparency, and more consistency between IRBs. However, such a discussion should center ethical considerations, not convenience or expediency.

Focusing on tone rather than substance makes it look like you don’t appreciate the substance of the critique.

Richman and Apte write the following of the worries bloggers raised with uBiome:

Some of the posts threw us off quite a bit as they seemed to be personal attacks rather than reasoned criticisms of our approach. …

We thought it was a bit… much, shall we say, to compare us to the Nazis (yes, that happened, read the posts) or to the Tuskegee Experiment because we funded our project without first paying thousands of dollars for IRB approval for a project that had not (and might never have) happened.

I have read all of the linked posts (here, here, here, here, here, here, here, and here) that Richman and Apte point to in leveling this complaint about tone. I don’t read them as comparing the uBiome team to Nazis or the researchers who oversaw the Tuskegee Syphilis Experiment.

I’m willing to stipulate that the tone of some of these posts was not at all cuddly. It may have made members of the uBiome team feel defensive.

However, addressing the actual ethical worries raised in these posts would have done a lot more for uBiome’s efforts to earn the public’s trust than adopting a defensive posture did.

Make no mistake, harsh language or not, the posts critical of uBiome were written by a bunch of people who know an awful lot about the ins and outs of ethical interactions with human subjects. These are also people who recognize from their professional lives that, while hard questions can feel like personal attacks, they still need to be answered. They are raising ethical concerns not to be pains, but because they think protecting human subjects matters — as does protecting the collective reputation of those who do human subjects research and/or citizen science.

Trust is easier to break than to build, which means one project’s ethical problems could be enough to sour the public on even the carefully designed projects of researchers who have taken much more care thinking through the ethical dimensions of their work. Addressing potential problems in advance seems like a better policy than hoping they’ll be no big deal.

And losing focus on the potential problems because you don’t like the way in which they were pointed out seems downright foolish.

Much of uBiome’s response to the hard questions raised about the ethics of their project has focused on tone, or on meeting examples that provide historical context for our ethical guidelines for human subject research with the protestation, “We’re not like that!” If nothing else, this suggests that the uBiome team hasn’t understood the point the examples are meant to convey, nor the patterns that they illuminate in terms of ethical pitfalls into which even non-evil scientists can fall if they’re not careful.

And it is not at all clear that the uBiome team’s tone in blog comments and on social media like Twitter has done much to help its case.

What is still lacking, amidst all their complaints about the tone of the critiques, is a clear account of how basic ethical questions (such as how uBiome will ensure that the joint roles of customer, citizen science participant, and human subject don’t lead to a compromise of autonomy or privacy) are being answered in uBiome’s research protocol.

A conversation on the substance of the critiques would be more productive here than one about who said something mean to whom.

Which brings me to my last issue:

New models of scientific funding, subject recruitment, and outreach that involve the internet are better served by teams that understand how the internet works.

Let’s say you’re trying to fund a project, recruit participants, build general understanding, enthusiasm, support, and trust. Let’s say that your efforts involve websites where you put out information and social media use where you amplify some of that information or push links to your websites or favorable media coverage.

People looking at the information you’ve put out there are going to draw conclusions based on the information you’ve made public. They may also draw speculative conclusions from the gaps — the information you haven’t made public.

You cannot, however, count on them to base their conclusions on information to which they’re not privy, including what’s in you’re heart.

There may be all sorts of good efforts happening behind the scenes to get rigorous ethical oversight off the ground. If it’s invisible to the public, there’s no reason the public should assume it’s happening.

If you want people to draw more accurate conclusions about what you’re doing, and about what potential problems might arise (and how you’re preparing to face them if they do), a good way to go is to make more information public.

Also, recognize that you’re involved in a conversation that is being conducted publicly. Among other things, this means it’s unreasonable to expect people with concern to take it to private email in order to get further information from you. You’re the one with a project that relies on cultivating public support and trust; you need to put the relevant information out there!

(What relevant information? Certainly the information relevant to responding to concerns and critiques articulated in the above-linked blog posts would be a good place to start — which is yet another reason why it’s good to be able to get past tone and understand substance.)

In a world where people email privately to get the information that might dispel their worries, those people are the only ones whose worries are addressed. The rest of the public that’s watching (but not necessarily tweeting, blogging, or commenting) doesn’t get that information (especially if you ask the people you email not to share the content of that email publicly). You may have fully lost their trust with nary a sign in your inboxes.

Maybe you wish the dynamics of the internet were different. Some days I do, too. But unless you’re going to fix the internet prior to embarking on your brave new world of crowdfunded citizen science, paying some attention to the dynamics as they are now will help you use it productively, rather than to create misunderstandings and distrust that then require remediation.

That could clear the way to a much more interesting and productive conversation between uBiome, other researchers, and the larger public.

Strategies to address questionable statistical practices.

If you have not yet read all you want to read about the wrongdoing of social psychologist Diederik Stapel, you may be interested in reading the 2012 Tilburg Report (PDF) on the matter. The full title of the English translation is “Flawed science: the fraudulent research practices of social psychologist Diederik Stapel” (in Dutch, “Falende wetenschap: De fruaduleuze onderzoekspraktijken van social-psycholoog Diederik Stapel”), and it’s 104 pages long, which might make it beach reading for the right kind of person.

If you’re not quite up to the whole report, Error Statistics Philosophy has a nice discussion of some of the highlights. In that post, D. G. Mayo writes:

The authors of the Report say they never anticipated giving a laundry list of “undesirable conduct” by which researchers can flout pretty obvious requirements for the responsible practice of science. It was an accidental byproduct of the investigation of one case (Diederik Stapel, social psychology) that they walked into a culture of “verification bias”. Maybe that’s why I find it so telling. It’s as if they could scarcely believe their ears when people they interviewed “defended the serious and less serious violations of proper scientific method with the words: that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences” (Report 48). …

I would place techniques for ‘verification bias’ under the general umbrella of techniques for squelching stringent criticism and repressing severe tests. These gambits make it so easy to find apparent support for one’s pet theory or hypotheses, as to count as no evidence at all (see some from their list ). Any field that regularly proceeds this way I would call a pseudoscience, or non-science, following Popper. “Observations or experiments can be accepted as supporting a theory (or a hypothesis, or a scientific assertion) only if these observations or experiments are severe tests of the theory.”

You’d imagine this would raise the stakes pretty significantly for the researcher who could be teetering on the edge of verification bias: fall off that cliff and what you’re doing is no longer worthy of the name scientific knowledge-building.

Psychology, after all, is one of those fields given a hard time by people in “hard sciences,” which are popularly reckoned to be more objective, more revealing of actual structures and mechanisms in the world — more science-y. Fair or not, this might mean that psychologist have something to prove about their hardheadedness as researchers, about the stringency of their methods. Some peer pressure within the field to live up to such standards would obviously be a good thing — and certainly, it would be a better thing for the scientific respectability of psychology than an “everyone is doing it” excuse for less stringent methods.

Plus, isn’t psychology a field whose practitioners should have a grip on the various cognitive biases to which we humans fall prey? Shouldn’t psychologists understand better than most the wisdom of putting structures in place (whether embodied in methodology or in social interactions) to counteract those cognitive biases?

Remember that part of Stapel’s M.O. was keeping current with the social psychology literature so he could formulate hypotheses that fit very comfortably with researchers’ expectations of how the phenomena they studied behaved. Then, fabricating the expected results for his “investigations” of these hypotheses, Stapel caught peer reviewers being credulous rather than appropriately skeptical.

Short of trying to reproduce the experiments Stapel described themselves, how could peer reviewers avoid being fooled? Mayo has a suggestion:

Rather than report on believability, researchers need to report the properties of the methods they used: What was their capacity to have identified, avoided, admitted verification bias? The role of probability here would not be to quantify the degree of confidence or believability in a hypothesis, given the background theory or most intuitively plausible paradigms, but rather to check how severely probed or well-tested a hypothesis is– whether the assessment is formal, quasi-formal or informal. Was a good job done in scrutinizing flaws…or a terrible one?  Or was there just a bit of data massaging and cherry picking to support the desired conclusion? As a matter of routine, researchers should tell us.

I’m no social psychologist, but this strikes me as a good concrete step that could help peer reviewers make better evaluations — and that should help scientists who don’t want to fool themselves (let alone their scientific peers) to be clearer about what they really know and how well they really know it.

The continuum between outright fraud and “sloppy science”: inside the frauds of Diederik Stapel (part 5).

It’s time for one last look at the excellent article by Yudhijit Bhattacharjee in the New York Times Magazine (published April 26, 2013) on social psychologist and scientific fraudster Diederik Stapel. We’ve already examined strategy Stapel pursued to fabricate persuasive “results”, the particular harms Stapel’s misconduct did to the graduate students he was training, and the apprehensions of the students and colleagues who suspected fraud was afoot about the prospect of blowing the whistle on Stapel. To close, let’s look at some of the uncomfortable lessons the Stapel case has for his scientific community — and perhaps for other scientific communities as well.

Bhattacharjee writes:

At the end of November, the universities unveiled their final report at a joint news conference: Stapel had committed fraud in at least 55 of his papers, as well as in 10 Ph.D. dissertations written by his students. The students were not culpable, even though their work was now tarnished. The field of psychology was indicted, too, with a finding that Stapel’s fraud went undetected for so long because of “a general culture of careless, selective and uncritical handling of research and data.” If Stapel was solely to blame for making stuff up, the report stated, his peers, journal editors and reviewers of the field’s top journals were to blame for letting him get away with it. The committees identified several practices as “sloppy science” — misuse of statistics, ignoring of data that do not conform to a desired hypothesis and the pursuit of a compelling story no matter how scientifically unsupported it may be.

The adjective “sloppy” seems charitable. Several psychologists I spoke to admitted that each of these more common practices was as deliberate as any of Stapel’s wholesale fabrications. Each was a choice made by the scientist every time he or she came to a fork in the road of experimental research — one way pointing to the truth, however dull and unsatisfying, and the other beckoning the researcher toward a rosier and more notable result that could be patently false or only partly true. What may be most troubling about the research culture the committees describe in their report are the plentiful opportunities and incentives for fraud. “The cookie jar was on the table without a lid” is how Stapel put it to me once. Those who suspect a colleague of fraud may be inclined to keep mum because of the potential costs of whistle-blowing.

The key to why Stapel got away with his fabrications for so long lies in his keen understanding of the sociology of his field. “I didn’t do strange stuff, I never said let’s do an experiment to show that the earth is flat,” he said. “I always checked — this may be by a cunning manipulative mind — that the experiment was reasonable, that it followed from the research that had come before, that it was just this extra step that everybody was waiting for.” He always read the research literature extensively to generate his hypotheses. “So that it was believable and could be argued that this was the only logical thing you would find,” he said. “Everybody wants you to be novel and creative, but you also need to be truthful and likely. You need to be able to say that this is completely new and exciting, but it’s very likely given what we know so far.”

Fraud like Stapel’s — brazen and careless in hindsight — might represent a lesser threat to the integrity of science than the massaging of data and selective reporting of experiments. The young professor who backed the two student whistle-blowers told me that tweaking results — like stopping data collection once the results confirm a hypothesis — is a common practice. “I could certainly see that if you do it in more subtle ways, it’s more difficult to detect,” Ap Dijksterhuis, one of the Netherlands’ best known psychologists, told me. He added that the field was making a sustained effort to remedy the problems that have been brought to light by Stapel’s fraud.

(Bold emphasis added.)

If the writers of this report are correct, the field of psychology failed in multiple ways here. First, they were insufficiently skeptical — both of Stapel’s purported findings and of their own preconceptions — to nip Stapel’s fabrications in the bud. And, they were themselves routinely engaging in practices that were bound to mislead.

Maybe these practices don’t rise to the level of outright fabrication. However, neither do they rise to the level of rigorous and intellectually honest scientific methodology.

There could be a number of explanations for these questionable methodological choices.

Possibly some of the psychologists engaging in this “sloppy science” lack a good understanding of statistics or of what counts as a properly rigorous test of one’s hypothesis. Essentially, this is an explanation of faulty methodology on the basis of ignorance. However, it’s likely that this is culpable ignorance — that psychology researchers have a positive duty to learn what they ought to know about statistics and hypothesis testing, and to avail themselves of available resources to ensure that they aren’t ignorant in this particular way.

I don’t know if efforts to improve statistics education are a part of the “sustained effort to remedy the problems that have been brought to light by Stapel’s fraud,” but I think they should be.

Another explanation for the lax methodology decried by the report is alluded to in the quoted passage: perhaps psychology researchers let the strength of their own intuitions about what they were going to see in their research results drive their methodology. Perhaps they unconsciously drifted away from methodological rigor and toward cherry-picking and misuse of statistics and the like because they knew in their hearts what the “right” answer would be. Given this kind of conviction, of course they would reject methods that didn’t yield the “right” answer in favor of those that did.

Here, too, the explanation does not provide an excuse. The scientist’s brief is not to take strong intuitions as true, but to look for evidence — especially evidence that could demonstrate that the intuitions are wrong. A good scientist should be on the alert for instances where she is being fooled by her intuitions. Rigorous methodology is one of the tools at her disposal to avoid being fooled. Organized skepticism from her fellow scientists is another.

From here, the explanations drift into waters where the researchers are even more culpable for their sloppiness. If you understand how to test hypotheses properly, and if you’re alert enough to the seductive power of your intuitions, it seems like the other reason you might engage in “sloppy science” is to make your results look less ambiguous, more certain, more persuasive than they really are, either to your fellow scientists or to others (administrators evaluating your tenure or promotion case? the public?). Knowingly providing a misleading picture of how good your results are is lying. It may be a lie of a smaller magnitude than Diederik Stapel’s full-scale fabrications, but it’s still dishonest.

And of course, there are plenty of reasons scientists (like other human beings) might try to rationalize a little lie as being not that bad. Maybe you really needed more persuasive preliminary data than you got to land the grant without which you won’t be able to support graduate students. Maybe you needed to make your conclusions look stronger to satisfy the notoriously difficult peer reviewers at the journal to which you submitted your manuscript. Maybe you are on the verge of getting credit for a paradigm-shaking insight in your field (if only you can put up the empirical results to support it), or of beating a competing research group to the finish line for an important discovery (if only you can persuade your peers that the results you have establish that discovery).

But maybe all these excuses prioritize scientific scorekeeping to the detriment of scientific knowledge-building.

Science is supposed to be an activity aimed at building a reliable body of knowledge about the world. You can’t reconcile this with lying, whether to yourself or to your fellow scientists. This means that scientists who are committed to the task must refrain from the little lies, and that they must take serious conscious steps to ensure that they don’t lie to themselves. Anything else runs the risk of derailing the whole project.

Reluctance to act on suspicions about fellow scientists: inside the frauds of Diederik Stapel (part 4).

It’s time for another post in which I chew on some tidbits from Yudhijit Bhattacharjee’s incredibly thought-provoking New York Times Magazine article (published April 26, 2013) on social psychologist and scientific fraudster Diederik Stapel. (You can also look at the tidbits I chewed on in part 1, part 2, and part 3.) This time I consider the question of why it was that, despite mounting clues that Stapel’s results were too good to be true, other scientists in Stapel’s orbit were reluctant to act on their suspicions that Stapel might be up to some sort of scientific misbehavior.

Let’s look at how Bhattacharjee sets the scene in the article:

[I]n the spring of 2010, a graduate student noticed anomalies in three experiments Stapel had run for him. When asked for the raw data, Stapel initially said he no longer had it. Later that year, shortly after Stapel became dean, the student mentioned his concerns to a young professor at the university gym. Each of them spoke to me but requested anonymity because they worried their careers would be damaged if they were identified.

The bold emphasis here (and in the quoted passages that follow) is mine. I find it striking that even now, when Stapel has essentially been fully discredited as a trustworthy scientist, these two members of the scientific community feel safer not being identified. It’s not entirely obvious to me if their worry is being identified as someone who was suspicious that fabrication was taking place but who said nothing to launch official inquiries — or whether they fear that being identified as someone who was suspicious of a fellow scientist could harm their standing in the scientific community.

If you dismiss that second possibility as totally implausible, read on:

The professor, who had been hired recently, began attending Stapel’s lab meetings. He was struck by how great the data looked, no matter the experiment. “I don’t know that I ever saw that a study failed, which is highly unusual,” he told me. “Even the best people, in my experience, have studies that fail constantly. Usually, half don’t work.”

The professor approached Stapel to team up on a research project, with the intent of getting a closer look at how he worked. “I wanted to kind of play around with one of these amazing data sets,” he told me. The two of them designed studies to test the premise that reminding people of the financial crisis makes them more likely to act generously.

In early February, Stapel claimed he had run the studies. “Everything worked really well,” the professor told me wryly. Stapel claimed there was a statistical relationship between awareness of the financial crisis and generosity. But when the professor looked at the data, he discovered inconsistencies confirming his suspicions that Stapel was engaging in fraud.

If one has suspicions about how reliable a fellow scientist’s results are, doing some empirical investigation seems like the right thing to do. Keeping an open mind and then examining the actual data might well show one’s suspicions to be unfounded.

Of course, that’s not what happened here. So, given a reason for doubt with stronger empirical support — not to mention the fact that scientists are trying to build a shared body of scientific knowledge (which means that unreliable papers in the literature can hurt the knowledge-building efforts of other scientists who trust that the work reported in that literature was done honestly), you would think the time was right for this professor to pass on what he had found to those at the university who could investigate further. Right?

The professor consulted a senior colleague in the United States, who told him he shouldn’t feel any obligation to report the matter.

For all the talk of science, and the scientific literature, being “self-correcting,” it’s hard to imagine the precise mechanism for such self-correction in a world where no scientist who is aware of likely scientific misconduct feels any obligation to report the matter.

But the person who alerted the young professor, along with another graduate student, refused to let it go. That spring, the other graduate student examined a number of data sets that Stapel had supplied to students and postdocs in recent years, many of which led to papers and dissertations. She found a host of anomalies, the smoking gun being a data set in which Stapel appeared to have done a copy-paste job, leaving two rows of data nearly identical to each other.

The two students decided to report the charges to the department head, Marcel Zeelenberg. But they worried that Zeelenberg, Stapel’s friend, might come to his defense. To sound him out, one of the students made up a scenario about a professor who committed academic fraud, and asked Zeelenberg what he thought about the situation, without telling him it was hypothetical. “They should hang him from the highest tree” if the allegations were true, was Zeelenberg’s response, according to the student.

Some might think these students were being excessively cautious, but the sad fact is that scientists faced with allegations of misconduct against a colleague — especially if they are brought by students — frequently side with their colleague and retaliate against those making the allegations. Students, after all, are new members of one’s professional community, so green one might not even think of them as really members. They are low status, they are learning how things work, they are judged likely to have misunderstood what they have seen. And, in contrast to one’s colleagues, students are transients. They are just passing through the training program, whereas you might hope to be with your colleagues for your whole professional life. In a case of dueling testimony, who are you more likely to believe?

Maybe the question should be whether your bias towards believing one over the other is strong enough to keep you from examining the available evidence to determine whether your trust is misplaced.

The students waited till the end of summer, when they would be at a conference with Zeelenberg in London. “We decided we should tell Marcel at the conference so that he couldn’t storm out and go to Diederik right away,” one of the students told me.

In London, the students met with Zeelenberg after dinner in the dorm where they were staying. As the night wore on, his initial skepticism turned into shock. It was nearly 3 when Zeelenberg finished his last beer and walked back to his room in a daze. In Tilburg that weekend, he confronted Stapel.

It might not be universally true, but at least some of the people who will lie about their scientific findings in a journal article will lie right to your face about whether they obtained those findings honestly. Yet lots of us think we can tell — at least with the people we know — whether they are being honest with us. This hunch can be just as wrong as the wrongest scientific hunch waiting for us to accumulate empirical evidence against it.

The students seeking Zeelenberg’s help in investigating Stapel’s misbehavior found a situation in which Zeelenberg would have to look at the empirical evidence first before he looked his colleague in the eye and asked him whether he was fabricating his results. They had already gotten him to say, at least in the abstract, that the kind of behavior they had reason to believe Stapel was committing was unacceptable in their scientific community. To make a conscious decision to ignore the empirical evidence would have meant Zeelenberg would have to see himself as displaying a kind of intellectual dishonesty — because if fabrication is harmful to science, it is harmful to science no matter who perpetrates it.

As it was, Zeelenberg likely had to make the painful concession that he had misjudged his colleague’s character and trustworthiness. But having wrong hunches is science is much less of a crime than clinging to those hunches in the face of mounting evidence against them.

Doing good science requires a delicate balance of trust and accountability. Scientists’ default position is to trust that other scientists are making honest efforts to build reliable scientific knowledge about the world, using empirical evidence and methods of inference that they display for the inspection (and critique) of their colleagues. Not to hold this default position means you have to build all your knowledge of the world yourself (which makes achieving anything like objective knowledge really hard). However, this trust is not unconditional, which is where the accountability comes is. Scientists recognize that they need to be transparent about what they did to build the knowledge — to be accountable when other scientists ask questions or disagree about conclusions — else that trust evaporates. When the evidence warrants it, distrusting a fellow scientist is not mean or uncollegial — it’s your duty. We need the help of other to build scientific knowledge, but if they insist that they ignore evidence of their scientific misbehavior, they’re not actually helping.

Scientific training and the Kobayashi Maru: inside the frauds of Diederik Stapel (part 3).

This post continues my discussion of issues raised in the article by Yudhijit Bhattacharjee in the New York Times Magazine (published April 26, 2013) on social psychologist and scientific fraudster Diederik Stapel. Part 1 looked at how expecting to find a particular kind of order in the universe may leave a scientific community more vulnerable to a fraudster claiming to have found results that display just that kind of order. Part 2 looked at some of the ways Stapel’s conduct did harm to the students he was supposed to be training to be scientists. Here, I want to point out another way that Stapel failed his students — ironically, by shielding them from failure.

Bhattacharjee writes:

[I]n the spring of 2010, a graduate student noticed anomalies in three experiments Stapel had run for him. When asked for the raw data, Stapel initially said he no longer had it. Later that year, shortly after Stapel became dean, the student mentioned his concerns to a young professor at the university gym. Each of them spoke to me but requested anonymity because they worried their careers would be damaged if they were identified.

The professor, who had been hired recently, began attending Stapel’s lab meetings. He was struck by how great the data looked, no matter the experiment. “I don’t know that I ever saw that a study failed, which is highly unusual,” he told me. “Even the best people, in my experience, have studies that fail constantly. Usually, half don’t work.”

In the next post, we’ll look at how this other professor’s curiosity about Stapel’s too-good-to-be-true results led to the unraveling of Stapel’s fraud. But I think it’s worth pausing here to say a bit more on how very odd a training environment Stapel’s research group provided for his students.

None of his studies failed. Since, as we saw in the last post, Stapel was also conducting (or, more accurately, claiming to conduct) his students’ studies, that means none of his students’ studies failed.

This is pretty much the opposite of every graduate student experience in an empirical field that I have heard described. Most studies fail. Getting to a 50% success rate with your empirical studies is a significant achievement.

Graduate students who are also Trekkies usually come to recognize that the travails of empirical studies are like a version of the Kobayashi Maru.

Introduced in Star Trek II: The Wrath of Khan, the Kobayashi Maru is a training simulation in which Star Fleet cadets are presented with a civilian ship in distress. Saving the civilians requires the cadet to violate treaty by entering the Neutral Zone (and in the simulation, this choice results in a Klingon attack and the boarding of the cadet’s ship). Honoring the treaty, on the other hand, means abandoning the civilians and their disabled ship in the Neutral Zone. The Kobayashi Maru is designed as a “no-win” scenario. The intent of the test is to discover how trainees face such a situation. Owing to James T. Kirk’s performance on the test, Wikipedia notes that some Trekkies also view the Kobayashi Maru as a problem whose solution depends on redefining the problem.

Scientific knowledge-building turns out to be packed with particular plans that cannot succeed at yielding the particular pieces of knowledge the scientists hope to discover. This is because scientists are formulating plans on the basis of what is already known to try to reveal what isn’t yet known — so knowing where to look, or what tools to use to do the looking, or what other features of the world are there to confound your ability to get clear information with those tools, is pretty hard.

Failed attempts happen. If they’re the sort of thing that will crush your spirit and leave you unable to shake it off and try it again, or to come up with a new strategy to try, then the life of a scientist will be a pretty hard life for you.

Grown-up scientists have studies fail all the time. Graduate students training to be scientists do, too. But graduate students also have mentors who are supposed to help them bounce back from failure — to figure out the most likely sources of failure, whether it’s worth trying the study again, whether a new approach would be better, whether some crucial piece of knowledge has been learned despite the failure of what was planned. Mentors give scientific trainees a set of strategies for responding to particular failures, and they also give reassurance that even good scientists fail.

Scientific knowledge is built by actual humans who don’t have perfect foresight about the features of the world as yet undiscovered, humans who don’t have perfectly precise instruments (or hands and eyes using those instruments), humans who sometimes mess up in executing their protocols. Yet the knowledge is built, and it frequently works pretty well.

In the context of scientific training, it strikes me as malpractice to send new scientists out into the world with the expectation that all of their studies should work, and without any experience grappling with studies that don’t work. Shielding his students from their Kobayashi Maru is just one more way Diederik Stapel cheated them out of a good scientific training.

Failing the scientists-in-training: inside the frauds of Diederik Stapel (part 2)

In this post, I’m continuing my discussion of the excellent article by Yudhijit Bhattacharjee in the New York Times Magazine (published April 26, 2013) on social psychologist and scientific fraudster Diederik Stapel. The last post considered how being disposed to expect order in the universe might have made other scientists in Stapel’s community less critical of his (fabricated) results than they could have been. Here, I want to shift my focus to some of the harm Stapel did beyond introducing lies to the scientific literature — specifically, the harm he did to the students he was supposed to be training to become good scientists.

I suppose it’s logically possible for a scientist to commit misconduct in a limited domain — say, to make up the results of his own research projects but to make every effort to train his students to be honest scientists. This doesn’t strike me as a likely scenario, though. Publishing fraudulent results as if they were factual is lying to one’s fellow scientists — including the generation of scientists one is training. Moreover, most research groups pursue interlocking questions, meaning that the questions the grad students are working to answer generally build on pieces of knowledge the boss has built — or, in Stapel’s case “built”. This means that at minimum, a fabricating PI is probably wasting his trainees’ time by letting them base their own research efforts on claims that there’s no good scientific reason to trust.

And as Bhattacharjee describes the situation for Stapel’s trainees, things for them were even worse:

He [Stapel] published more than two dozen studies while at Groningen, many of them written with his doctoral students. They don’t appear to have questioned why their supervisor was running many of the experiments for them. Nor did his colleagues inquire about this unusual practice.

(Bold emphasis added.)

I’d have thought that one of the things a scientist-in-training hopes to learn in the course of her graduate studies is not just how to design a good experiment, but how to implement it. Making your experimental design work in the real world is often much harder than it seems like it will be, but you learn from these difficulties — about the parameters you ignored in the design that turn out to be important, about the limitations of your measurement strategies, about ways the system you’re studying frustrates the expectations you had about it before you were actually interacting with it.

I’ll even go out on a limb and say that some experience doing experiments can make a significant difference in a scientist’s skill conceiving of experimental approaches to problems.

That Stapel cut his students out of doing the experiments was downright weird.

Now, scientific trainees probably don’t have the most realistic picture of precisely what competencies they need to master to become successful grown-up scientists in a field. They trust that the grown-up scientists training them know what these competencies are, and that these grown-up scientists will make sure that they encounter them in their training. Stapel’s trainees likely trusted him to guide them. Maybe they thought that he would have them conducting experiments if that were a skill that would require a significant amount of time or effort to master. Maybe they assumed that implementing the experiments they had designed was just so straightforward that Stapel thought they were better served working to learn other competencies instead.

(For that to be the case, though, Stapel would have to be the world’s most reassuring graduate advisor. I know my impostor complex was strong enough that I wouldn’t have believed I could do an experiment my boss or my fellow grad students viewed as totally easy until I had actually done it successfully three times. If I had to bet money, it would be that some of Stapel’s trainees wanted to learn how to do the experiments, but they were too scared to ask.)

There’s no reason, however, that Stapel’s colleagues should have thought it was OK that his trainees were not learning how to do experiments by taking charge of doing their own. If they did know and they did nothing, they were complicit in a failure to provide adequate scientific training to trainees in their program. If they didn’t know, that’s an argument that departments ought to take more responsibility for their trainees and to exercise more oversight rather than leaving each trainee to the mercies of his or her advisor.

And, as becomes clear from the New York Times Magazine article, doing experiments wasn’t the only piece of standard scientific training of which Stapel’s trainees were deprived. Bhattacharjee describes the revelation when a colleague collaborated with Stapel on a piece of research:

Stapel and [Ad] Vingerhoets [a colleague of his at Tilburg] worked together with a research assistant to prepare the coloring pages and the questionnaires. Stapel told Vingerhoets that he would collect the data from a school where he had contacts. A few weeks later, he called Vingerhoets to his office and showed him the results, scribbled on a sheet of paper. Vingerhoets was delighted to see a significant difference between the two conditions, indicating that children exposed to a teary-eyed picture were much more willing to share candy. It was sure to result in a high-profile publication. “I said, ‘This is so fantastic, so incredible,’ ” Vingerhoets told me.

He began writing the paper, but then he wondered if the data had shown any difference between girls and boys. “What about gender differences?” he asked Stapel, requesting to see the data. Stapel told him the data hadn’t been entered into a computer yet.

Vingerhoets was stumped. Stapel had shown him means and standard deviations and even a statistical index attesting to the reliability of the questionnaire, which would have seemed to require a computer to produce. Vingerhoets wondered if Stapel, as dean, was somehow testing him. Suspecting fraud, he consulted a retired professor to figure out what to do. “Do you really believe that someone with [Stapel’s] status faked data?” the professor asked him.

“At that moment,” Vingerhoets told me, “I decided that I would not report it to the rector.”

Stapel’s modus operandi was to make up his results out of whole cloth — to produce “findings” that looked statistically plausible without the muss and fuss of conducting actual experiments or collecting actual data. Indeed, since the thing he was creating that needed to look plausible enough to be accepted by his fellow scientists was the analyzed data, he didn’t bother making up raw data from which such an analysis could be generated.

Connecting the dots here, this surely means that Stapel’s trainees must not have gotten any experience dealing with raw data or learning how to apply methods of analysis to actual data sets. This left another gaping hole in the scientific training they deserved.

It would seem that those being trained by other scientists in Stapel’s program were getting some experience in conducting experiments, collecting data, and analyzing their data — since that experimentation, data collection, and data analysis became fodder for discussion in the ethics training that Stapel led. From the article:

And yet as part of a graduate seminar he taught on research ethics, Stapel would ask his students to dig back into their own research and look for things that might have been unethical. “They got back with terrible lapses­,” he told me. “No informed consent, no debriefing of subjects, then of course in data analysis, looking only at some data and not all the data.” He didn’t see the same problems in his own work, he said, because there were no real data to contend with.

I would love to know the process by which Stapel’s program decided that he was the best one to teach the graduate seminar on research ethics. I wonder if this particular teaching assignment was one of those burdens that his colleagues tried to dodge, or if research ethics was viewed as a teaching assignment requiring no special expertise. I wonder how it’s sitting with them that they let a now-famous cheater teach their grad students how to be ethical scientists.

The whole “those who can’t do, teach” adage rings hollow here.