We can create useful student evaluations of teaching. Here’s how.


Student evaluations are here to stay. And that’s the way it should be. I think universities owe it to students to provide a structured opportunity to provide feedback on classroom experiences. It’s not a matter of “customer service,” but instead, of respecting students and hearing what they have to say. But the way evaluations are typically structured, they facilitate inappropriate application and interpretation, and they don’t ask what we should be asking.

Considering the substantial biases that we see in student evaluations of teaching, we definitely should not be using these for quantitative comparisons of faculty performance. Even written remarks need to be interpreted in light of faculty identity and many additional contexts. These evaluations are not meaningless or useless, but pulling meaning and utility out of them isn’t so easy and it’s hard or impossible to get it right.

So, how do we fix it? I was chatting with another professor in my department in the hallway a few weeks ago, and she had a spectacular idea. I said, “Hey, would you like to write this up as a post?” and she said, “Ha, as if I have the time to do something like that!” “Would you mind if I wrote it up then?” “Sure.” And, ta da.

To make teaching evaluations more effective, instead of using vague Likert-scale ratings, we need to ask unambiguous questions that reflect explicit performance criteria.

For example: Was the instructor late to class on a regular basis?

Or: How many days did it take for you to receive assignments back?

Or: Did you visit scheduled office hours? Was your instructor present at posted hours?

Perhaps: Did the instructor use disparaging language about a student in the class? If so, what did they say?

Maybe: Did the instructor provide safety training in lab?

Or: Did the instructor allow students to eat in lab?

How about: Did you have adequate time to complete exams in class?

What do you think about: Do you have a documented learning disability that was unaccommodated by the instructor?

Or: Did the grading scheme for the course follow the syllabus provided by the instructor?

In essence, the one thing that evaluations could be most useful for is the one thing that isn’t on them: they ask students to report whether instructors are meeting baseline performance criteria.

For example, a professor could be showing up late to class every day, take a month to return back assignments (or perhaps not at all), lie to students about extra credit, skip out on office hours, and fail to accommodate students with disabilities. And their evaluations might hint at this, but they could sneak away with decent evals while doing all of this, and there wouldn’t be any clear indication that they’re doing anything wrong. The stereotype of the derelict tenured professor who fails to do their job at the minimum level expected of them is not an epidemic. But it’s something that should at least be detectable in student evaluations! And it’s definitely something that should be detected in less experienced faculty as well.

If we are asking students a vague Likert scale question to evaluate the effectiveness of their instructors, then all kinds of biases are going to be involved and there’s no clear way to account for them. I do think it’s good to ask open-ended questions about what students thought were effective and ineffective parts of the course, and if they have any particular praises or grievances they wish to air, because this is part of protecting students from malfeasance. However, those responses may or may not correspond to teaching effectiveness.

I suppose some students could just lie on the teaching evaluations about these criteria. They could claim the instructor was late to class, when in fact they were not. But these are matters of fact, rather than impressions or opinions. They can be verified or investigated if necessary, if a student claims a professor isn’t doing the fundamentals required of their job.

I’m not saying questions about explicit and unambiguous performance criteria will tell us about teaching effectiveness. But most student teaching evaluations that are in use nowadays won’t do that either. We might as well make them useful, eh?

9 thoughts on “We can create useful student evaluations of teaching. Here’s how.

  1. Terry, in your experience are many instructors late to class? Do many instructors allow blatant violations of lab safety rules? Do many instructors ignore the grading scheme described in the syllabus? Do many instructors ignore their posted office hours? Do many instructors give exams that many students can’t complete in the time available? And when instructors do those things, do students not mention them on teaching evaluation forms? I ask because in my admittedly-anecdotal experience here at Calgary, no instructors do any of those things. So I confess I actually see your questions as more difficult to extract useful information from than the standard sorts of questions asked on student evaluation forms. Your suggested questions just ask students to confirm that the instructor meets the bare minimum standards of professional competence in certain quite specific ways. The vast majority of instructors, and the higher-ups charged with evaluating them, already know that they meet the bare minimum standards of professional competence.

    Plus, my experience here at Calgary is that, when asked open-ended questions on anonymous evaluation forms, they would volunteer the information that the instructor was consistently late to class (or whatever).

    I suppose the sorts of specific questions you pose might be useful in rare cases of derelict faculty. But I’m not even sure about that, because there are so many specific ways in which an instructor might be derelict. A form like the one you suggest seems likely to overlook some rare problems by not listing them.

    In saying all of this, I don’t mean to downplay the importance of identifying derelict instructors–it’s very important, even if they’re rare (as I believe they are).

    Am I drastically overgeneralizing from my own admittedly-anecdotal experience here? Looking forward to hearing about the experiences of others.

    • If everybody at your university is meeting the basic expectations, that’s wonderful. Everywhere I have worked, students have volunteered stories about a small number of faculty who have not met these expectations. I’m not seeking these out, but when listening, I hear them.) I’ve known of professors who regularly insult religious students. Professors who are chronically late. One professor who actually prepared food in a microbiology lab. Professors who skip the safety training. Professors that take over a month to return midterms. And of course, many professors who sexually harass their students.

    • I agree. I will say that when I was a TA, back in the last century, there was a very antiquated evaluation form that asked 100 or so questions like this, except that they were better — less like spying. Things like: do you think blackboard was used well, do you think class moved at a good pace, etc. I found the responses were helpful and were not always what I would have expected. So: I see the value of very concrete questions, but these questions seem to ask the students to think of themselves as a kind of police, and at the same time seem designed to get a top rating for anyone who meets some minimum standards.

  2. Totally agree, and nicely put. Current evaluation tools are usually developed over many months (or years) of contentious debate, and may be hollow shells of their initial form. I wonder if there is a more effective way to craft meaningful teaching evaluation questions that won’t be watered down in the process.

  3. Perhaps this would be an improvement, but one issue that you should consider is that there have been studies showing that even on objective criteria like these, women are rated lower. I don’t have the reference offhand, but it’s the study that was reported on a couple of years ago where they messed with the gender of online instructors: there was a male and female instructor, who each taught two sections of a course, one with a male name (so the students believed they were male) and one with a female name (so the students believed they were female). One of the course evaluation scales was about promptness of returning assignments, and even though assignments were returned at the exact same time in all cases, students rated the instructor they believed to be female lower on promptness than the male instructor (whether or not the instructor was actually male or female). So, it’s worth noting that in the literature gender bias can creep into even seemingly objective measures like whether or not an assignment was returned promptly. Maybe asking for a number of days would help, but I suspect not as much as we hope it would.

  4. I taught with someone who regularly made exams that were difficult to complete in the time allotted. A constant complaint I hear about another at my institution is that the exams are virtually impossible to complete in the time allotted such that, when I teach a course after that course I can tell there has been and exam in that class because there are a number of late arrivals to my class.

Leave a Reply