Category Archives: Teacher Evaluations

What Will Matter 50 Years From Now?

Screen Shot 2015-07-23 at 5.41.51 PM

Matt Barnum, who has been doing a good job over at The Seventy Four, just wrote a thoughtful piece on Arne Duncan’s legacy.

Matt argues that Duncan should have stuck to pushing for test based teacher evals for only those teachers covered by preexisitng annual tests. I’m sympathetic to Matt’s argument, but I also haven’t spent much time thinking about this specific issue.

What I do sometimes think about this: what policies will matter 50 years from now?

This is not to say that we should only focus on policies that will have 50 year staying power, but, in expending political capital, reform longevity should be a part of the calculation.

I am skeptical that government mandated teacher evaluations will still be a major issue in 50 years. My guess is that a combination of deregulation (charters not being a part of state evaluation systems) and technological advancement (less reliance on annual tests for measuring teacher performance) will render the issue mostly moot.

If I had created Race to the Top, I probably would have focused on the following:

1. Governance: incentivizing alternative forms of governance (RSDs, alternative authorizers, etc.).

2. School Operators: increasing supply of high-quality charters, contract, and vouchers schools.

3. Teacher pipelines: creating new pipelines and reforming existing institutions.

4. Standards and Assessments: incentivizing the raising of standards and the adoption of rigorous assessments.

I think the aforementioned initiatives would all have increased the probability of increasing student achievement. I also think these initiatives would have had some staying power.

I have no idea if they would have been politically feasible to push from the federal level in 2009.

Lastly, for whatever it’s worth, I have a lot of respect for Arne Duncan. Being a cabinet secretary for eight years takes a lot of grit and passion.

The Forest and the Trees: Performance Management Addition


Over at Valerie Strauss’ Washington Post Blog, William Doyle penned a piece arguing that “corporate reform” is an insult to corporations.

William’s intention, I believe, is to demonstrate that public schools would be better off without the reforms which he opposes, including: common core standards and assessments, multiple choice assessments, and test score based teacher evaluations.


As a counterpoint to these reform efforts, Williams details how Microsoft has stopped force ranking its employees.

His point is that corporations such as Microsoft are moving away from the type of human capital reforms that are being implemented in public education.

This argument seems to be missing the forest for the trees.

The “trees” here are how one corporation is overhauling its human capital system.

The “forest” is that the company is doing this because it exists in a competitive landscape. William quotes from an article:

The internal change mirrors the shift CEO Satya Nadella is working to effect externally, charming and collaborating with startups and venture-capital firms so that Microsoft doesn’t get left behind.

The point is this: there is no perfect evaluation system; whatever is the best system today will likely not be the best system in perpetuity; by having organizations compete against each for talent, the best systems will emerge.


Charter school districts, such as exists in New Orleans, require schools to compete for talent. How are they responding? You can read about that in this Slate piece. The piece also details the efforts of YES Prep:

The network announced earlier this month a series of initiatives to improve retention, including across-the-board pay raises. In addition, more seasoned teachers will have a personal budget to spend on professional development, and more input on how their job evaluations will work. The network has also cut back on school hours and mandatory after-school activities.

Doug Lemov also recently wrote about the connection between school choice and teacher wellbeing:

In short, more choice would likely lead to higher teacher satisfaction—who wants to spend their career at odds with the organization they work for or trying to hide from the training it offers?


William’s bio indicates that we was just selected as 2015-2016 Fulbright Scholar to study global education best practices, and that he previously he managed budgets totaling over $200 million for public U.S. media companies, including HBO.

I hope that William’s Fulbright experience will allow him to study education systems, such as New Orleans, that harness competitive principals for social aims.

The point is not that schools in these systems necessarily have the best human capital systems.

The idea is that, over time, they are more likely too than schools in traditional systems.

In a city where only one organization operates public schools, there is only way to evaluate talent, and there’s only one place for educators to work.

This is not a recipe for success.

Through Whose Eyes Should We Judge Teacher Evaluation Systems?


Yesterday, I critiqued Rebecca Mead for succumbing to the Nirvana Fallacy in her article on teacher evaluations.

Specifically, I noted that she strongly rejected weighting value-added measures for 50% of teacher evaluations without offering any other alternative – making it impossible to gauge whether her preferred solution (if she has one) is better or worse than the 50% value-added proposal.

Just because the 50% weight is not Nirvana, it could still be the best available option.


Many of Mead’s points had to do with a teacher being unfairly punished due to problems with the manner in which value-added is calculated.

These are all fair points, but they paint a misleading picture of how evaluation systems should work.

The burden of proof for evaluation systems is not Beyond a Reasonable Doubt. We don’t have to be 99.9% certain that an evaluation system will work correctly every time. If this were the case, no company would ever be able to evaluate its employees.

It might be reasonable to argue that you’d let 999 guilty people go free to avoid executing one innocent person; it’s not reasonable, in my opinion, that you’d let 999 poor-performing teachers stay in the classroom to avoid firing one high-performing teacher.

Or to put it another way: a teacher evaluation system must balance the need of students to have high-performing teachers with the needs of teachers to be treated fairly.

So when we ask scare mongering questions, such as: what if a teacher gets fired because of his students’ performance on a single test? We should also ask: what if a poor-performing teacher never gets fired because the evaluation system is toothless?

We should view teacher evaluation policy through the eyes of the student more so than we should view it through the eyes of the teacher.


My guess is, that overtime, the best way to evaluate teachers will be to combine statistical models with human judgement. The reason I believe this is twofold.

First, I think assessments will keep on increasing in both quality and frequency, which will reduce the error rate of value-added models. When value-added models are based on weekly, adaptive computer based tests, we will have much more data to utilize when crunching value-added ratings.

Second, I think human judgment will be important because the role of teacher may evolve faster than our ability to gauge accurate measurements in all desired areas. For example, as computers carry more of the instructional load, teachers may play a larger role in guiding socio-emotional development, and, at the outset, this type of teaching might be hard to measure with statistical models.

But this is mostly conjecture. The point is that evaluating teachers well will always be a function of every changing variables, including technology and job role.

Given that these variables will change frequently, legislating evaluation policies carries significant risk.


Lastly, so long as we legislate evaluation policies, these policies will become proxies for tribal fights within the education community.

If we simply held schools accountable for results, however, evaluation policies would be a tool to increase student achievement. No more, no less.

Which regime sounds better to you?

What We Talk About When We Talk About Teacher Evaluations in the New Yorker

new yorker

Rebecca Mead, a staff writer at the New Yorker, has a piece on teacher evaluations in this month’s issue.

As I’ve written before, I have very mixed feelings on legislatively mandated teacher evaluations.

Good journalism could go far in curbing some of the excesses of this policy initiative. Unfortunately, in this piece, Mead fails to rigorously analyze the issue at hand.

Instead, she falls into the two traps of education reporting: (1) over focusing on raising or lowering the status of individuals and (2) not having a good enough grasp of the research data.

Additionally, she succumbs to the Nirvana Fallacy.

Status Games

Mead’s method or raising or lowering the status of individuals is to selectively quote them. Cuomo’s quotations are strident and simple-minded, while Farina and de Blasio quotes are warm-hearted and nuanced. None of the three are angels; each could have been quoted in a manner that lowers their status, but, in this case, only Cuomo receives the treatment.

In this piece and a previous piece, Mead also refers to the fact that she sends her son to a public school where over 70% of the students opt-ed out of the state test (in the earlier piece she insinuates her son opted out), which seems to put her squarely in Farina and de Blasio’s camp.

By raising the status of those she agrees with and lowering the status of those she disagrees with, Mead’s writing comes off as biased.

Weak Grasp of Research 

On the data side, Mead’s discussion of the reliability of value-added ratings consists of three words; she calls it: “a contested science.”

These three words are hyperlinked to a Valerie Strauss (not a blogger known for nuance) post that highlights a single piece of research against value-added teacher evaluations.

Mead should have mentioned other studies, which, together, present a more complicated picture.

Despite it being core to her argument, she does not mention the research comparing the reliability of evaluations conducted by principals, students, and outside observers. Nor does she cover the research demonstrating that testing is a key driver of learning.

But, most surprisingly, she does not mention Chetty’s value-added study. To quote from the study:

Students assigned to high-VA teachers are more likely to attend college, attend higher- ranked colleges, earn higher salaries, live in higher SES neighborhoods, and save more for retirement. They are also less likely to have children as teenagers. Teachers have large impacts in all grades from 4 to 8. On average, a one standard deviation improvement in teacher VA in a single grade raises earnings by about 1% at age 28. Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase the present value of students’ lifetime income by more than $250,000 for the average class- room in our sample. We conclude that good teachers create substantial economic value and that test score impacts are helpful in identifying such teachers.

Chetty is not a corporate reformer hack. He’s a John Bates Clark Medal winner who teaches economics at Harvard.

This is not to say that the Chetty study proves that Cuomo’s proposal is right on the merits. But it is worth mentioning.

If I were a parent trying to gauge if we should evaluate teachers by their value-added score, I’d want to be aware of a major longitudinal study that links high value-added scores with major positive outcomes in student lives.

Instead of reviewing this study, however, Mead provides us with a hundred word summary of de Blasio’s testimony to the budget committee in Albany.

Nirvana Fallacy

Mead asserts that “no reasonable person” denies that teachers should be evaluated; for her, the question is in the “how.”

The current system, which weights student achievement growth for 20% of the overall evaluation score, has resulted in 98.7% of teachers being rated effective.

Cuomo believes that increasing the weight of student achievement growth will deliver more accurate ratings. This may or may not be true. But Cuomo has put forth a proposal that can be evaluated.

Mead does no such thing, nor do her protagonists.

Cuomo’s proposal might not be perfect, but what we should consider is: (1) is it better than the current system? and (2) is it better than any proposed alternatives?

Mead criticizes Cuomo’s proposal without turning this same discerning eye onto the status quo policy or other alternatives.

Just because Cuomo’s proposed policy is not Nirvana, it doesn’t mean it’s not the best option.

In Sum 

Mead argues that we overuse testing in public schools. To make her case, she: (1) raises the status of those who agree with her (2) lowers the status of those who don’t (3) overlooks important research that provides evidence against her thesis (4) criticizes a proposal that might improve teacher evaluations (5) but then provides no alternative solution.

Lastly, she does not even address the major elephant in the room: the reason we’re even having this debate is because of the dysfunctional relationship between government operated school systems and public labor unions.

As I’ve argued before, I think we should let non-profit organizations operate schools, hold these schools accountable for results, and get out of the business of passing one-size-fits-all evaluation laws.

On this last point, Mead and I might be in agreement.