When surveys fail.

[Steve’s note: If you have any relationship to surveys – and in these days, practically anyone in business has a relationship with a customer service survey – this is important to you. This is a critique of a survey used at my university. It doesn’t measure what it claims to. It doesn’t even do a very good job measuring what it does measure. But we keep using it. This critique not only points out the flaws, but I have at the end an alternate survey that could do a far better job. It only took me about half an hour to make. Hopefully this critique – and my proposed change – will help you work with the surveys you need to deal with so that you get the information you want and need.]

“I hate student evaluations,” she said. She teaches at the same university that I attend, though I’ve never taken a class with her.

“It’s better than RateMyProfessor.com,” I replied. “I’ve seen some really stupid comments there.”

She was unconvinced. “I have to use this stupid book for the class. I don’t have any control over that, so I have a whole bunch of powerpoints and supplementary materials to go with the class. Yet on the question that asks if course materials contributed to the course, I get poor marks.” She stops me before I can ask the next, obvious question. “They like the powerpoints. They hate the book. They say so in the comments section. But when it asks about course materials, they always think about the book they have to buy for the course.”

Students and instructors alike hate the “Student Evaluation of Instruction” my university uses, and rightly so. For an instrument that presumably captures the efficacy of instructional pedagogy, it has a frightfully poor design, both in visual layout and in terms of data collection.

To be fair, it does possess a few strengths. The instrument is simple. The directions are succinct, clear, and very visible on the survey instrument. It is very short, with a few demographic questions, seven Likert scale questions, and three open-ended questions at the end. It is very possible to complete this instrument within five minutes. That it has open-ended questions with ample space to complete them is a considerable bonus in attempting to collect the fullness of the student experience.

That said, the weaknesses of the instrument far outweigh its relatively minor strengths. Perhaps the largest weakness is that despite the explicit promise of anonymity, the instrument cannot be so. The open-ended questions, especially in a course where a significant amount of composition has occurred, greatly lend themselves to identifying the respondent. Should the instruction be able to obtain the individual data, it may be possible to determine the identity of a student from the minimal amount of demographic data collected.

There are many questions that omit possible – and common – potential responses. In the demographic information, students are asked to list their college or school, but omits the School of Graduate Studies. When it asks what grade you expect to receive, it omits the possibility of the class simply being a PASS/FAIL course.

Throughout the Likert scale questions, there is no provision for “Does not apply”. For example, a seminar session may not require (or even reasonably expect) to consult with the instructor. Or the student may have never needed to consult with the instructor, and therefore finds themselves ignorant of the availability of the instructor.

As noted above, “course materials” is not properly conceptualized or operationalized,
resulting in respondent confusion between the text or the other materials provided by the instructor. Even if the situation was reversed – where supplementary materials were substandard and the text was a masterpiece – it is impossible to separate the evaluation of both.

Questions four through six, all nominally asking about the respondent’s learning throughout the course, fail to take into account prior knowledge held by the student. Question seven asks if the student was motivated coming into the course, but fails to ask if that attitude changed during the duration of the course. The open-ended questions leave approximately a quarter of the instrument blank and unusable while simultaneously leaving an intimidating amount of whitespace for responses. This creates the perception that such space must be filled – and that can discourage respondents. The instrument is further often administered at a time when students are either pressured to complete the instrument so that class or a test may begin, or at the end of a class period when they have an incentive to leave the classroom and go on about their day.

Finally, this instrument merely measures self-reported perception of the instructor’s technique, while leaving unmeasured the actual efficacy of the instructor’s pedagogy. For example, an instructor who is extremely lenient may receive high marks on this instrument (though not performing their job), while a more difficult instructor may receive lower marks while more successfully instructing the students. This is compounded by the conflict of interest of the instructor, whose future rewards (e.g. salary and tenure) are influenced by pleasing the respondents. Pleasing respondents is not equivalent to teaching them well.

Part of revising this instrument will involve revising the manner of its administration, so that respondents do not feel pressured to quickly complete the instrument. Further, it should be noted that this instrument cannot be considered a comprehensive evaluation of instructional techniques without assessing student performance in the class. That kind of revamping is beyond the scope of this project, but would involve (at the minimum) determining the correlation between perception and actual performance, and ideally would include making the responses confidential instead of anonymous.

So following you’ll find a proposed (by me) revised instrument that actually does the job – and could do it well.