Chad has posted an interesting discussion of a study of students’ academic performance and how it is correlated to their evaluations of the faculty teaching them. The study in question is Carrell, S., & West, J. (2010). Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors Journal of Political Economy, 118 (3), 409-432 (DOI 10.1086/653808) . Go read Chad’s post for a detailed discussion of the methodology of the study, since it will likely answer your questions about my quick overview here. After the overview, I’m going to offer a few more thoughts on the explanations the study authors propose for their findings.
The study, done with data from the U.S. Air Force Academy (where there is a large-ish set of courses all students are required to take, to which students are assigned at random, and which are evaluated on the basis of common exams in which faculty are not necessarily grading their own students, etc.), found that:
- There was a small but significantly significant correlation between a faculty member’s academic rank, teaching experience, and terminal degree status and the average grades of the students in that faculty member’s courses. However, it was a negative correlation — which is to say, the students of the less experienced instructors did slightly better in the Calculus I sections they had with those instructors than did the students taking the Calculus I from more experienced instructors.
- In the courses in the required sequence after Calculus I, there was a small but significantly significant correlation between students’ performance and their performance in Calculus I. Again, though, this is a negative correlation — the students with better grades in Calculus I tended to end up with worse grades in the subsequent course in the sequence.
- Student evaluations of their instructors were positively correlated with the grades they earned in the instructor’s course — higher grades correlated with more favorable evaluation of instructor performance and vice versa.
So, what’s the cause of this? Nice as their dataset is, they don’t have any way to really work that out, but they offer three possible explanations: First, that less experienced instructors are more likely to adhere strictly to the common curriculum, and are thus effectively “teaching to the test,” while their more experienced colleagues teach a broader range of stuff that leads to better understanding and thus better performance in future classes. The second possibility is that students whose introductory course instructors are “teaching to the test” pick up bad study habits, which come back and bite them in later courses. The final explanation, which even the authors characterize as “cynical,” is that students who get low-value-added professors in the introductory course put out more effort in the subsequent courses, in order to pick up their GPA.
They are admirably cautious in drawing conclusions based on all this. About the strongest statement they make in the paper is the concluding paragraph:Regardless of how these effects may operate, our results show that student evaluations reward professors who increase achievement in the contemporaneous course being taught, not those who increase deep learning. Using our various measures of teacher quality to rankāorder teachers leads to profoundly different results. Since many U.S. colleges and universities use student evaluations as a measurement of teaching quality for academic promotion and tenure decisions, this finding draws into question the value and accuracy of this practice.
The explanations the researchers offer don’t seem too unreasonable. Indeed, given how much attention is typically paid to preparing Ph.D. candidates to be classroom teachers, what seems to be happening in this study seems almost predictable.
Most of the teaching experiences people get during their graduate training is as teaching assistants — maybe they lead recitation sections that include lecture-like presentations, or maybe they mostly field questions. Due to the “weed ’em out” structuring of math and science curricula at many of the institutions where graduate students are trained, there are usually many, many more introductory sections that need T.A.s than intermediate and advanced sections (since you’ve failed or scared away enough of the intro students that you don’t need the same number of sections of courses further on in the sequence). And given the emphasis placed on research over teaching in most graduate programs, graduate students typically don’t pick up much teaching (even T.A.ing) beyond the bare minimum that is required. In my chemistry graduate program, the requirement was that you T.A. for the equivalent of one academic year (and most people finished that their first year).
This is to say, when we’re talking about newish faculty having relatively little teaching experience, it may be a lot less than you expect (especially if you’re a parent writing the check for tuition).
Given their experience being students, though, graduate T.A.s are often remarkably focused on helping their own students not to freak out. They work hard to find ways to make the material make sense, if for no other reason than how draining it can be to deal with panicky students in recitation sections week after week. If the job is to get students through the intro material without imploding, they can do that.
As the study suggests, though, getting through the first course in the sequence can maybe be accomplished in a way that is suboptimal when it comes to laying the groundwork for the next course, or the course after that.
Some of this may be due to the focus on the immediate task (getting students through course in progress at the time) rather than pulling back to tell one’s panicky students, “Here’s how you will use and extend what we’re doing here in the next course.” In chemistry, there was also the issue that subsequent courses in the sequence moved beyond the simpler models that worked well enough in the earlier courses (so an intro student’s grasp of “orbitals” could be pretty confined to atomic orbitals, while in later courses understanding “orbitals” meant understanding hybrid or molecular orbitals). Some students who did really well in their introductory courses were not very good at letting go of the models that got them through.
Especially because abandoning those simple models in favor of the more complicated ones could feel like saying that the T.A. with all the empathy, who made that weeder class both do-able and enjoyable, must have lied to you by pointing you toward those simple models.
To my mind, this is a more complicated situation than students picking up inadequate study skills or teachers just teaching to the tests. Students are often surprised that learning a subject requires learning a sequence of increasingly more sophisticated models, or increasingly more sophisticated analytical techniques or methods of approximation, or what have you. Learning the next chunk of knowledge in the line is not just a matter of adding more on, but also of recognizing the problems with the chunk of knowledge you learned before. This is a surprise to many students, and the fact that it can throw students out of equilibrium is not necessarily something you notice as a newish instructor trying to establish your own equilibrium.
It’s certainly not the kind of thing you notice until you’re teaching one of those courses that is later in the sequence, where you are helping students cope with the realization that what they thought they knew is a lie (of the useful sort that simple models or methods often are).
Do students who do worse in intro courses already have a more pragmatic view of the simpler models or methods transmitted in those courses? I have no idea. But I have seen students who did well in intro courses flounder in the courses that follow upon them because of an overinflated confidence in the soundness of the knowledge they had mastered at the introductory level. Testing whether something like this is really going on would require more information than is contained in the data about grades, though.
One conclusion of this study, that student evaluations of faculty performance don’t indicate that the students have learned all that we want them to, is no surprise at all. This is part of why institutions that care about teaching hardly ever rely on student evaluations of teaching as the only source of data to evaluate faculty teaching. (At my university, for example, there is regular peer reviewing of teaching, and these peer reviews are important in retention, tenure, and promotion decisions.)
But we shouldn’t conclude from this that student evaluations of teaching are worthless if for no other reason that students’ experiences of their learning in the classroom matter. Keeping them entertained but teaching them nothing would obviously be a problem (and one we ought to be able to track though evaluation tools like grades), but so too would teaching students a great deal while leaving them traumatized in the process.
That would be no way to encourage students to seek out more knowledge.
So far as student evaluations go, some students will rate the instructor the best ever, some will demand that the instructor be fired for incompetence, with most students somewhere in between. I have found student evaluations useful to tell me what students liked and did not like about my teaching.
Even in an introductory course, which I teach from a somewhat historical perspective, models change with time. I make a particular point, when discussing genes, that our understanding, and therefore correct answers on tests, will change as our understanding becomes more modern.
“In the courses in the required sequence after Calculus I, there was a small but significantly significant correlation between students’ performance and their performance in Calculus I. Again, though, this is a negative correlation — the students with better grades in Calculus I tended to end up with worse grades in the subsequent course in the sequence.”
Isn’t that called regression toward the mean?