Student evaluations are here to stay. And that’s the way it should be. I think universities owe it to students to provide a structured opportunity to provide feedback on classroom experiences. It’s not a matter of “customer service,” but instead, of respecting students and hearing what they have to say. But the way evaluations are typically structured, they facilitate inappropriate application and interpretation, and they don’t ask what we should be asking.
Considering the substantial biases that we see in student evaluations of teaching, we definitely should not be using these for quantitative comparisons of faculty performance. Even written remarks need to be interpreted in light of faculty identity and many additional contexts. These evaluations are not meaningless or useless, but pulling meaning and utility out of them isn’t so easy and it’s hard or impossible to get it right.
So, how do we fix it? I was chatting with another professor in my department in the hallway a few weeks ago, and she had a spectacular idea. I said, “Hey, would you like to write this up as a post?” and she said, “Ha, as if I have the time to do something like that!” “Would you mind if I wrote it up then?” “Sure.” And, ta da.
To make teaching evaluations more effective, instead of using vague Likert-scale ratings, we need to ask unambiguous questions that reflect explicit performance criteria.
For example: Was the instructor late to class on a regular basis?
Or: How many days did it take for you to receive assignments back?
Or: Did you visit scheduled office hours? Was your instructor present at posted hours?
Perhaps: Did the instructor use disparaging language about a student in the class? If so, what did they say?
Maybe: Did the instructor provide safety training in lab?
Or: Did the instructor allow students to eat in lab?
How about: Did you have adequate time to complete exams in class?
What do you think about: Do you have a documented learning disability that was unaccommodated by the instructor?
Or: Did the grading scheme for the course follow the syllabus provided by the instructor?
In essence, the one thing that evaluations could be most useful for is the one thing that isn’t on them: they ask students to report whether instructors are meeting baseline performance criteria.
For example, a professor could be showing up late to class every day, take a month to return back assignments (or perhaps not at all), lie to students about extra credit, skip out on office hours, and fail to accommodate students with disabilities. And their evaluations might hint at this, but they could sneak away with decent evals while doing all of this, and there wouldn’t be any clear indication that they’re doing anything wrong. The stereotype of the derelict tenured professor who fails to do their job at the minimum level expected of them is not an epidemic. But it’s something that should at least be detectable in student evaluations! And it’s definitely something that should be detected in less experienced faculty as well.
If we are asking students a vague Likert scale question to evaluate the effectiveness of their instructors, then all kinds of biases are going to be involved and there’s no clear way to account for them. I do think it’s good to ask open-ended questions about what students thought were effective and ineffective parts of the course, and if they have any particular praises or grievances they wish to air, because this is part of protecting students from malfeasance. However, those responses may or may not correspond to teaching effectiveness.
I suppose some students could just lie on the teaching evaluations about these criteria. They could claim the instructor was late to class, when in fact they were not. But these are matters of fact, rather than impressions or opinions. They can be verified or investigated if necessary, if a student claims a professor isn’t doing the fundamentals required of their job.
I’m not saying questions about explicit and unambiguous performance criteria will tell us about teaching effectiveness. But most student teaching evaluations that are in use nowadays won’t do that either. We might as well make them useful, eh?