Through Whose Eyes Should We Judge Teacher Evaluation Systems?


Yesterday, I critiqued Rebecca Mead for succumbing to the Nirvana Fallacy in her article on teacher evaluations.

Specifically, I noted that she strongly rejected weighting value-added measures for 50% of teacher evaluations without offering any other alternative – making it impossible to gauge whether her preferred solution (if she has one) is better or worse than the 50% value-added proposal.

Just because the 50% weight is not Nirvana, it could still be the best available option.


Many of Mead’s points had to do with a teacher being unfairly punished due to problems with the manner in which value-added is calculated.

These are all fair points, but they paint a misleading picture of how evaluation systems should work.

The burden of proof for evaluation systems is not Beyond a Reasonable Doubt. We don’t have to be 99.9% certain that an evaluation system will work correctly every time. If this were the case, no company would ever be able to evaluate its employees.

It might be reasonable to argue that you’d let 999 guilty people go free to avoid executing one innocent person; it’s not reasonable, in my opinion, that you’d let 999 poor-performing teachers stay in the classroom to avoid firing one high-performing teacher.

Or to put it another way: a teacher evaluation system must balance the need of students to have high-performing teachers with the needs of teachers to be treated fairly.

So when we ask scare mongering questions, such as: what if a teacher gets fired because of his students’ performance on a single test? We should also ask: what if a poor-performing teacher never gets fired because the evaluation system is toothless?

We should view teacher evaluation policy through the eyes of the student more so than we should view it through the eyes of the teacher.


My guess is, that overtime, the best way to evaluate teachers will be to combine statistical models with human judgement. The reason I believe this is twofold.

First, I think assessments will keep on increasing in both quality and frequency, which will reduce the error rate of value-added models. When value-added models are based on weekly, adaptive computer based tests, we will have much more data to utilize when crunching value-added ratings.

Second, I think human judgment will be important because the role of teacher may evolve faster than our ability to gauge accurate measurements in all desired areas. For example, as computers carry more of the instructional load, teachers may play a larger role in guiding socio-emotional development, and, at the outset, this type of teaching might be hard to measure with statistical models.

But this is mostly conjecture. The point is that evaluating teachers well will always be a function of every changing variables, including technology and job role.

Given that these variables will change frequently, legislating evaluation policies carries significant risk.


Lastly, so long as we legislate evaluation policies, these policies will become proxies for tribal fights within the education community.

If we simply held schools accountable for results, however, evaluation policies would be a tool to increase student achievement. No more, no less.

Which regime sounds better to you?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.