In recent days, there have been signs on the horizon of an impending blogwar. Prof-like Substance fired the first volley:
[A]lmost all major genomics centers are going to a zero-embargo data release policy. Essentially, once the sequencing is done and the annotation has been run, the data is on the web in a searchable and downloadable format.
Yikes.
How many other fields put their data directly on the web before those who produced it have the opportunity to analyze it? Now, obviously no one is going to yank a genome paper right out from under the group working on it, but what about comparative studies? What about searching out specific genes for multi-gene phylogenetics? Where is the line for what is permissible to use before the genome is published? How much of a grace period do people get with data that has gone public, but that they* paid for?
—–
*Obviously we are talking about grant-funded projects, so the money is tax payer money not any one person’s. Nevertheless, someone came up with the idea and got it funded, so there is some ownership there.
Then, Mike the Mad Biologist fired off this reply:
Several of the large centers, including the one I work at, are funded by NIAID to sequence microorganisms related to human health and disease (analogous programs for human biology are supported by NHGRI). There’s a reason why NIH is hard-assed about data release:
Funding agencies learned this the hard way, as too many early sequencing centers resembled ‘genomic roach motels’: DNA checks in, but sequence doesn’t check out.
The funding agencies’ mission is to improve human health (or some other laudable goal), not to improve someone’s tenure package. This might seem harsh unless we remember how many of these center-based genome projects are funded. The investigator’s grant is not paying for the sequencing. In the case of NIAID, there is a white paper process. Before NIAID will approve the project, several goals have to be met in the white paper (Note: while I’m discussing NIAID, other agencies have a similar process, if different scientific objectives).
Obviously, the organism and collection of strains to be sequenced have to be relevant to human health. But the project also must have significant community input. NIAID absolutely does not want this to be an end-run around R01 grants. Consequently, these sequencing projects should not be a project that belongs to a single lab, and which lacks involvement by others in the subdiscipline (“this looks like an R01” is a pejorative). It also has to provide a community resource. In other words, data from a successful project should be used rapidly by other groups: that’s the whole point (otherwise, write an R01 proposal). The white paper should also contain a general description of the analysis goals of the project (and, ideally, who in the collaborative group will address them). If you get ‘scooped’, that’s, in part, a project planning issue.
NIAID, along with other agencies and institutes, is pushing hard for rapid public release. Why does NIAID get to call the shots? Because it’s their money.
Which brings me to the issue of ‘whose’ genomes these are. The answer is very simple: NIH’s (and by extension, the American people’s). As I mentioned above, NIH doesn’t care about your tenure package, or your dissertation (given that many dissertations and research programs are funded in part or in their entirely by NIH and other agencies, they’re already being generous†). What they want is high-quality data that are accessible to as many researchers as possible as quickly as possible. To put this (very) bluntly, medically important data should not be held hostage by career notions. That is the ethical position.
Prof-like substance hurled back a hefty latex pillow of a rejoinder:
People feel like anything that is public is free to use, and maybe they should. But how would you feel as the researcher who assembled a group of researchers from the community, put a proposal together, drummed up support from the community outside of your research team, produced and purified the sample to be sequenced (which is not exactly just using a Sigma kit in a LOT of cases), dealt with the administration issues that crop up along the way, pushed the project through (another aspect woefully under appreciated) the center, got your research community together once they data were in hand to make sense of it all and herded the cats to get the paper together? Would you feel some ownership, even if it was public dollars that funded the project?
Now what if you submitted the manuscript and then opened your copy of Science and saw the major finding that you centered the genome paper around has been plucked out by another group and publish in isolation? Would you say, “well, the data’s publicly available, what’s unscrupulous about using it?” …
[L]et’s couch this in the reality of the changing technology. If your choice is to have the sequencing done for free, but risk losing it right off the machine, OR to do it with your own funds (>$40,000) and have exclusive right to it until the paper is published, what are you going to choose? You can draw the line regarding big and small centers or projects all you want, but it is becoming increasingly fuzzy.
This is all to get back to my point that if major sequencing centers want to stay ahead of the curve, they have to have policies that are going to encourage, not discourage, investigators to use them.
It’s fair to say that I don’t know from genomics. However, I think the ethical landscape of this disagreement bears closer examination.