In a recent post, Candid Engineer raised some interesting questions about data and ethics:
When I was a graduate student, I studied the effects of many different chemicals on a particular cell type. I usually had anywhere from n=4 to n=9. I would look at the data set as a whole, and throw out the outlying points. For example, if I had 4 data points with the values 4.3, 4.2, 4.4, and 5.5, I would throw out the 5.5.
Now that I am older, wiser, and more inclined to believe that I am fully capable of acquiring reproducible data, I am more reluctant to throw away the outlying data points. Unless I know there is a very specific reason why a value was off, I’ll keep all of the data. This practice, naturally, results in larger error bars.
And it’s occurred to me that it might not even be ethical to throw out the oddball data points. I’m really not sure. No one has ever taught me this. Any opinions out there?
My two different ways of handling data actually reflect an evolution in the way I think about biological phenomena. When I was a graduate student, I very much believed that there was a “right answer” in my experiments, and that the whole point of me collecting all of that data was to find the single right answer to my question.
But anymore, I’m not so sure that cells contain one right answer. For a lot of phenomena that I study, it’s totally possible that my cells might display a range of behaviors, and who am I to demand that they decide on doing only one thing? As we all know, cells are dynamic beings, and I no longer feel bad about my 20% error bars. I’ve become more accepting that maybe that’s just the way it is.
The shift in attitude that Candid Engineer describes seems pretty accurate to me. When we learn science, it isn’t just problem sets that feed the expectation that there is a right answer. Lab courses do it, too, and sometimes they subject your work to grading schemes that dock points depending on how far your measured results are from the expected answer regardless of your care and skill in setting up and executing the experiment. Under these circumstances, it can seem like the smart thing to do it to find that right answer by whatever means are available. If you know roughly what the right answer is, throwing out data that don’t fit your expectations seems reasonable — because those are the measurements that have to be wrong.
While this may be a reasonable strategy for getting through the canned “experiments” in a lab course, it’s not such a great strategy when you are involved in actual research — that is, trying to build new knowledge by answering questions to which we don’t know the answers. In those cases, the data are what you have to figure out the phenomenon, and ultimately you’re accountable to the phenomenon rather than to the expectations you bring to your experimental exploration of it. Your expectations, after all, could be wrong. That’s why you have to do the research rather than just going with your expectations.
This attitudinal shift — and the difficulty one might experience making it — points to another issue Candid Engineer raises: There isn’t as much explicit teaching as there ought to be, especially of graduate students who are learning to do real research, with respect to data handling and analysis. This is why it’s easy for habits picked up in laboratory courses to set down deeper roots in graduate school. Such habits are also encouraged when mentors identify “good data” with “data that shows what we want to show” rather than with robust, reproducible data that gives us insight to features of the system being studied.
Depending on what those phenomena are really like — something we’re trying to work out from the data — it’s quite possible that messy data are an accurate reflection of a complicated reality.
Now, in the comments on Candid Engineer’s post, there is much discussion about whether Candid Engineer’s prior strategies of data handling were ethical or scientifically reasonable (as well as about whether other researchers standardly deal with outliers this way, and what methods for dealing with outliers might be better, and so on). So I figured I would weigh in on the ethics side of things.
Different fields have different habits with respect to their treatment of outliers, error bars, statistical analyses. Doing it differently does not automatically mean doing it wrong.
Whatever methods one is using for dealing with outliers, error bars, and statistical analyses, the watchword should be transparency. Communications about results at all levels (whether in published articles or talking within a lab group) should be explicit about reasons for dropping particular data points, and clear about what kinds of statistical analyses were applied to the data and why.
Honesty is central to scientific credibility, but I think we need to recognize that what counts as honest communications of scientific results is not necessarily obvious to the scientific trainee. The grown-up scientists training new scientists need to take responsibility for teaching trainees the right way to do things — and for modeling the right ways themselves in their own communications with other scientists. After all, if everyone in a field agrees that a particular way of treating outlying data points is reasonable, there can be no harm in stating explicitly, “These were the outliers we left out, and here’s why we left them out.”
On the other hand, if your treatment of the data is a secret you’re trying to protect — even as you’re discussing those results in a paper that will be read by other scientists — that’s a warning sign that you’re doing something wrong.
Hat-tip: Comrade PhysioProf
Of course you plot/report all your data, and if you want to discard some outliers in your analysis you do so at that point…
With a sample size of 4 to 9 cells it is doubtful that any point could have been considered an outlier by any reasonable statistical method. It would have been helpful for candid engineer to learn about basic statistics first… maybe the ethical question would have disappeared by then.
Of course many problems do not have a ‘single answer’ but the answer is a distribution. Isn’t this Stats 101?
I will note that in the present (unspeakable) funding climate, universities are cutting back on the amount of hands-on labwork that undergraduates do.
Now, I was no fan of those canned labs all those years ago, but one of the few lessons that industry managed to pound through my head is the absolutely essential value of meticulous lab practice. Your experimental design may be good or bad, your execution may be sloppy or letter-perfect, but your documentation must be absolutely exhaustive, clear, and complete.
The message gets inked all over your precious skin when Legal deposes you on the contents of your lab notebook. The ink gets driven in with a roomful of needle-bright eyes when you’re called upon to make a presentation to the management of your employer’s (very large and famous) customer on your root-cause analysis of a problem with a custom product — and you’re making the call on who foots the bill for a revision that’s going to cost a lot more than the contents of your 401(k).
Anyone with data in hand should ask hirself the question: if I had to testify under penalty of perjury about this data, what would I want for documentation?
Janet, thanks for your take on the matter. What has been really shocking to me through the comments is the incredible diversity of experience and training that many of us seem to have.
Commenter Denis, above, for example, asks if I haven’t learned such things in Stats101. Well, I wish I would have, Denis, but I’m a bit ashamed to say that I know almost nothing of statistics because my engineering curriculum never afforded me the time to take a stat class. The only statistics I’ve learned have been self-taught, and the problem with self-teaching is that you don’t always know what is most important to teach yourself. We also never had an ethics course in undergrad. No stats/ethics in high school or grad school, either.
So where are we supposed to pick up this sort of thing? As a graduate student, unless our supervisors make a point of sitting down with us when we are early-year students, how are we supposed to understand what is ethical in situations like this where it’s not obvious? I know, personally, I would have been loathe to ask my advisor unsolicited to sit down with me to make sure I was doing my data analysis correctly. I felt, at the time, like I knew what I was doing.
I think the real take-home message here is that we all need to be aware that well-meaning students, out of ignorance, may be practicing data analysis that is not statistically or reproducibly sound. We need to be more cognizant of guiding them in their data analysis, making sure they have adequate background, and teaching them ourselves if they are not properly trained in statistics.
I vaguely recall that mine did, but it wasn’t presented in a way that made it “stick” — but I was an arrogant snot even then, so I can’t put all the blame on my teachers!
However, I do know now how important those stats are to a day-to-day working engineer, because everything we do comes down to stats in the end: yield, error rates, you name it.
That’s also why (getting back to Our Most Excellent Hostess’ point) one has to be exceedingly humble about “outliers.” When your normal operation has to cover a BER [1] of 10E-15, those “outliers” are what it’s all about. Context may provide a sound rationale for excluding them, but there’s always the risk of:
One is Prime,
Three is prime,
Five is prime,
seven is prime,
nine is experimental error,
eleven is prime,
thirteen is prime,
QED.
[1] Bit Error Rate
It might benefit the physical sciences to have a look at the social sciences on this point — $DAUGHTER informs me that the core MA curriculum in sociology includes something like six hours of ground-up research statistics.
Yeah, I know — it’s not like the MS curriculum has lots of blanks. Still, as you point out it’s important enough that there’s no excuse for leaving it to chance.
And (nods again to Our Hostess) at least a few organized talks on research ethics would be good, too. I’m afraid that there’s too little respect among physical sciences on the philosophy front. Pity, too — I had a blast in those classes.
Six hours?! In total?
Getting back to the OP, I’ve a professional statistician, so these are issues I come across. There’s no consensus on how to deal with outliers: some people say leave them all in because they’re data, some are more liberal. Certainly if you know why a data point is very odd, then remove it. If you don’t, there may still be a good reason but you don’t know it. Or it may be genuine. In these cases, I’m worried about whether it makes a difference to the inference. If it doesn’t, you might as well leave it in. If it does, you can be screwed both ways. What I would do would depend on the context and details, but generally I’d prefer to report the full analysis, and note the effect of the outliers. I’m trying to learn and understand the data, and this is an important part of it.
The bottom line, a Dr. Freeride says, is transparency.
Bob O’H: I’m confident that DC means six credit hours, i.e., 2 semester courses each 40+ hours of classroom time, plus the homeworks, etc.
I think everyone is right in arguing for transparency, but there is also the struggle to explain things in a non-verbose manner when trying to compile many data sets and experiments into a single journal article, often with word limits. I think it is at that point that transparency and good intentions often fall by the wayside.
Re stats and ethics training, a data point:
I’m a psych grad student, and the standard psych curricula include a semester of undergrad stats (mocked as “statistics for the very afraid” by those of us on the more-mathy end of the distribution of psych students), and two semesters of grad stats. My university is consdiering adding a requirement for a third semester of quantitative analysis, which I think would be a Very Good Idea. As well, my undergrad curriculum included a semester of ‘Research Methods’, which talked broadly about research ethics, among other things. My grad program includes an ethics “class” (mandated by NIH for students on training grants, and extended to all students in ‘health related sciences’) which covered a lot of issues.
Nonetheless, I find myself frequently making decisions about data that were not ever explicitly covered by any of this training. I try to sanity-check my decisions against other students, and when possible with my PI, but yeah, I’m always wondering what I don’t know about the appropriate standards to apply.
Yes, Ma’am. And this is graduate-level stats; prerequisites not included. It might also be more than six hours; $DAUGHTER has been a bit enthusiastic about methods from undergraduate days and may I won’t swear I was taking good notes as we chatted.
Sounds like a good chance for some research methods type to get some publications out of formalizing commonly-used strategies. That way, a researcher could simply note that anomalous data were screened using Gilligan’s Method and that would be that.
Coincidence that this discussion is happening this week, as is this paper?
I was taught by my PI never to omit outliers, to include them in the stats, and to, if necessary, try to explain them (or explain them away) in the discussion (or even a footnote). Still, the focus was on the mean, mentally ignoring the outliers. But I was most excited about the outliers: why do a few birds do something different than most of them? If I figure out why they respond differently to the same treatment, I will learn something about the system itself – it HAS to be such as to allow for both types of responses. I made nice discoveries using that way of thinking, focusing on the outliers and not the mainstream.
As the saying goes, the most important exclamation in science isn’t “Eureka!” but “Hmmm. That’s odd …” Or, I confess, a more physioprofic outburst.
The vast majority of data collected in a research laboratory is never published. Scientists throw out fucktons of data when they know it is unreliable or meaningless for one reason or another. The expectation that all of those discarded data must be mentioned in manuscripts that report non-discarded data is ridiculous, and would destroy the utility of the scientific literature. It is ridiculous for the same reason that so-called “open notebook” science is ridiculous.
We simply have no choice but to trust our fellow scientists’ judgment in choosing what data to reveal to the world, and what data to ignore. I don’t have the fucking time or inclination to want to know all the various circumstances and justifications for my colleagues discarding data.
I’d be a little worried about eliminating outliers even if you know why they’re “wrong.” Unless you have some sort of effective blind (which I think you usually don’t in these cases), won’t unavoidable bias make you more sensitive to “obvious” wrongs in one direction than in another? Or, if you’re extra-careful-scrupulous, make you bend over too far backwards and be more sensitive to “obvious” wrongs on the side you *do* like?
If you have the luxury of a big N, I’d think the right thing to do would be to faithfully include all data points and assume the outliers will come out in the wash. If you don’t…well, as a layperson, I have to admit that I was surprised to hear there even *were* experiments with N in the single digits. I mean, won’t error effects (whether obvious or not) swamp everything else at such small numbers?