Category Archives: Allure of Order

What do test scores tell us about schools?

Collin Hitt, Michael Q. McShane, and Patrick J. Wolf just published a report on the connection (or lack of connection) between test scores and long-term outcomes.

They looked at a bunch of school choices studies and tried to see if a school’s impact on student test scores was connected to its impact on student life outcomes.

Their conclusion: “at least for school choice programs, there is a weak relationship between impacts on test scores and later-life outcomes.”

Much of our K12 education policy is predicated on the idea that test scores are an important measure of school performance. If this is not true, behavior should change.

The Laura and John Arnold Foundation is currently funding more research on this question, and I’m eager to see what we find, especially with regards to income gains over time. This study was predicated on high school and college attainment being an indicator of long-term outcomes, but schooling isn’t always learning. So we should be careful judging school performance based on later school attainment, rather than income (or other measures).

But for now, here’s what’s on my mind after reading the study.

What do bad test score results tell us?

If I’m reading their report correctly (and I hope the authors correct me if I’m not), it seems rare that schools have a negative impact on test scores but a positive impact on long-term outcomes.

In the 126 study comparisons where test scores impacts were compared to high school or college outcomes, there were only 2 instances where a study found a significant negative impact on test scores and a significant positive impact on life outcomes.

It seems rare for a school to do really poorly on test scores but really great on life outcomes. I think the authors underemphasize this point in their paper.

What do mediocre test scores tell us?

It is much more common for schools to have a neutral (insignificant) impact on test scores but a significant positive impact on long-term outcomes. It looks like this occurred 32 times in the research review.

There may be a bunch of schools that don’t really impact test scores but are doing something that helps with long-term outcomes.

What do great test scores tell us?

There are no cases where a study found significantly positive test scores and significantly negative life outcomes.

So it seems rare for schools to jack up test scores but ruin kids lives. That’s good.

However, there were 17 instances of studies finding positive impact on test scores but neutral impacts on long-term outcomes.

So there seem to be some schools that are achieving good test results without translating these into great long-term outcomes.

How should this research affect regulation and philanthropy?

If these results hold, I think I will maintain my belief that we should replace schools with persistent very negative test scores. There appears to be little risk that these schools are really amazing schools. The negative test scores are a useful signal.

Yes, there might be other schools that are just as bad at life outcomes that are not closed because they achieve better test scores, but so long as we are closing schools that are not delivering great life outcomes, and opening schools that have a better chance of achieving great life outcomes, this seems like a worthwhile tradeoff.

But when it comes to expanding schools, if this research holds, I will rely less on positive test scores, and I think authorizers should do the same.

From an authorizer perspective, so long as a school does not have significantly negative test scores, perhaps the school should be able to expand so long as there is parent demand.

Philanthropy may also need to adjust by investing more heavily in school operators that show a positive impact on life outcomes (irregardless of test scores), and being willing to fund mediocre test score schools who either have high parent demand or who are using practices that are correlated with positive long-term outcomes (more research needed to determine what these might be).

I am very open to moving in this direction if research warrants it. The idea that it’s easier to tell a bad school than it is to identify a great school already matches my intuition, and deferring to parent judgment makes a lot of sense if we are not confident in our analysis of performance.

What if everything you believe about education is wrong?

New Orleans achieved dramatic student achievement results while, at the same time, upending much of the conventional wisdom of education policy.

The Education Research Alliance has done a bunch of great research that should humble pundits and educators alike; some highlights below:

Popular Belief #1: Experienced Teachers are Better

New Orleans increased student achievement while the % of teachers with less than 5 years of experience skyrocketed, and the % of teachers with over 20 years of experience plummeted.


Popular Belief #2: Teacher Credentialing Matters

New Orleans increased student achievement while the % of teachers with no, temporary or, the lowest level of certification (C/1) skyrocketed, and the % of teachers with advanced credentials (A/3 and B/2) plummeted.


Popular Belief #3: Teacher Turnover is Bad

New Orleans increased student achievement while teacher turnover significantly increased.


Popular Belief #4: Education Choice Markets Will Fail Due to Information Problems

A continuing criticism of choice based reforms is that parents will make poor decisions due to being incapable of understanding school quality. In New Orleans, student achievement (SPS score) was one of only a few factors that strongly increased the likelihood a family would choose a school in the open enrollment system.


Popular Belief #5: School Choice Only Benefits a Select Group of Choosers

A continuing criticism of choice based reforms is that only the active choosers will get into the schools they want. In New Orleans, 75% of families were matched with one of their top three choices.


Popular Belief #6: School Choice Increases Student Mobility

Numerous commentators have argued that school choice will cause significant increases in student mobility. New Orleans moved to an all choice system and student mobility decreased.


Popular Belief #7: School Closure Harms the Children Attending the Closed School

In New Orleans, students attending schools that were closed (intervention schools) saw their student achievement increase significantly in the subsequent years.


Popular Belief #8: Money is Best Spent “in the Classroom”

In comparison to other districts, New Orleans increased student achievement while spending more on administrative costs and less on instructional costs.


To be clear, I’m not saying that any of the above caused the student achievement gains in New Orleans.

Rather, I’m only pointing out that New Orleans saw some of the most dramatic student achievement gains in our country’s recent history while doing a bunch of things that you’re not supposed to do.

I’m sure we’ll learn more over the coming decade.

Formation: Why We’re Far Away from Peak Teacher Performance

I just read Peak: Secrets from the New Science of Expertise.

One of the book’s main arguments is this:

  1. Performance is improvement is driven by: maintaining intense focus, staying on the edge of one’s comfort zone, getting immediate feedback, identifying weak points and developing practice techniques designed specifically to address these weaknesses.
  2. This cycle is best done in fields where there is a long history of teaching that clearly articulates specific phases of mastery (musical instruments, chess, etc. all have fairly linear performance paths).
  3. Because deliberate practice is hard work, those individuals who are successful over the long run have generally found ways to keep themselves motivated and have crafted supportive environments for themselves.


Jal Mehta’s book The Allure of Order  thoughtfully narrated how teaching failed to develop a professional body of knowledge.

Rather than refining practice by building a long-history of evolutionary cataloguing of what works or conducting rigorous research on teaching techniques, the teaching profession formed through continuous bruising battles around contract rights.

In many cases, these battles led to real improvements in teaching workforce conditions; however, they also came at the expense of a professionalization of the practice.


So, for most of the 20th century, teaching suffered from a lack of a body of knowledge around performance progression *and* a lack of a culture of feedback.

The lessons put forth in Peak have in most ways been ignored.

Children have likely suffered.


Enter Harriet Ball.

Enter Doug Lemov.

Enter Dave Levin.

Enter Mike Goldstein.

And so forth.

Basically, you have a group of educators saying: what the f**k?

Why, in one of the world’s oldest professions, do we not have a cannon of performance progression?


I am highly skeptical of most human capital education reform efforts.

I think state mandated teacher evaluations will yield little over time.

I think most education schools care more about spreading ideology than building a knowledge base around effective teaching.

I think most districts are hopeless when it comes to giving timely and precise feedback to teachers.


My guess is that the way forward is supporting the Lemov / Relay effort to capture the practices of best teachers, and then to compliment this evolutionary approach with RCTs when feasible.

And move from district operation of schools to non-profit operation of schools (so as to better implement cycles of feedback + creating intensive and insular cultures of performance perfection, as with music academies).

But given our starting point, we’re probably decades away from hitting peak teacher performance at scale.

Is Roland Fryer Right? Or has the RCT Fallacy Reared its Ugly Head?

Screen Shot 2016-04-11 at 9.34.05 AM

Roland Fryer just published a compilation guide to 196 RCTs in education. HT to my colleague Stuart Buck for passing it along.

The compilation is a good review of a bunch of interesting studies. Roland’s contributions always make me think. He also won the John Bates Clark Medal, which is basically the Nobel prize for economics for people under 40.

Yet, while this RCT compilation is informative, I’d be very, very, very hesitant to pass a bunch of laws and regulations based on this type of meta-research.


Increasingly, policy makers and pundits are using RCT evidence to make policy. This is generally a step in the right direction, and it’s great to see evidence playing a bigger role in policy making.

Yet, sometimes RCTs are more about Rigorously Contorted Tales than Randomized Controlled Trials.

Call it the RCT Fallacy.

In statistical terms, the RCT Fallacy is pretty close to the concept of external validity, but I think the RCT Fallacy has a little more psychology to it.

So here goes:

The RCT Fallacy occurs when thought leaders propose adoption of policies based on the results of   RCTs so as to avoid the messiness of politics, ideology, history, psychology, and evolution.

Fryer is more balanced than most, but, in this case, I think he still succumbs to the fallacy.


The RCT Fallacy is grounded in the following:

  • There is an inverse correlation between the external validity of a RCT and the operational complexity of an industry.
  • If you have a RCT on your side, it’s much easier to defend yourself against being unreasonable, even if the RCT has very questionable external validity.
  • If you don’t have a RCT on your side, you can be called an ideologue even if you’re making a very well thought out case.
  • This leads to the perverse incentive of thought leaders being in a safer place trumpeting policies with modest RCT support rather than proposing solutions that are grounded in a deep understanding of systems, organizations, and humans – but which are difficult to measure with RCTs.
  • RCTs overvalue what can be measured quantitatively.
  • RCTs overvalue the worth of understanding existing best practices and testing pilots over the creation of entire systems that accelerate new best practices.
  • In complex systems with complex organizations, evolution is a  better change mechanism than running RCTs and implementing best practice adoption, especially in policy areas where some type of accountability (user choice, output measurement, etc.) can “kill off” bad ideas.
  • Quasi-experimental studies are often a better way to capture the effects of the impact of complex systems, as it is very difficult to conduct large scale RCTs on system level policy adoption.


In other words, RCTs will never tell us:

  • Whether democracies are better than dictatorships.
  • How to invent an iPhone.
  • Whether capitalism is better than Communism.
  • Whether single payer health systems are better than market based health systems.
  • Whether or not a start-up will be successful.

Yes, well designed RCTs can inform our decisions on the above issues, but RCTs will not provide definitive evidence on these issues.


Fryer’s paper ends with his summary of the RCT evidence in education.

He argues that RCTs have demonstrated that four interventions work:  pre-k, high dosage tutoring, managed teacher PD, and charter schools.

The paper ends with the following rally cry:

Screen Shot 2016-04-07 at 8.00.11 PM

I’m not sure courage is what we need:

Pre-K: There is pretty mixed evidence on our ability to scale effective pre-k. Fryer himself notes: “of the 64 treatment effects recorded in these randomized studies [on pre-k], 21 were statistically positive; zero were statistically negative and 43 were statistically indistinguishable from zero.”

Again, I’m not sure “courage” is the term I’d use to describe scaling an intervention that shows zero effect 67% of the time.

Tutoring: Fryer covers some high-dosage tutoring studies that show strong effects. However, the costs of these programs are sometimes upwards of 20% of total per-student spending. Moreover, there would likely be severe human capital limitations if we tried to give high dosage tutoring to all the students who needed it.

Managed Teacher PD: Fryer covers studies that show success for Success For All and Reading Recovery programs. The data seems robust and schools should surely consider adopting these programs. But here’s the thing: nothing is preventing districts from adopting these programs right now!

Perhaps either districts know something that these RCTs aren’t picking up, or perhaps districts are so poorly run that it takes a dramatic intervention to get them to adopt effective programs that have been around for 10+ years.

Charter Schools: While I clearly support charter expansion, charter RCTs often run into the issue of using lottery data which limits trials to schools that are oversubscribed (and thus creates positive bias); as such, I generally view CREDO’s far reaching urban quasi-expermintal studies to be of more use.


Again, I don’t mean to pick on Fryer. I’ve learned a ton from reading his research and children would be better off universities were filled with thinkers like him. His work on “looking under the hood” of high-performing charters greatly influenced my thinking on schools, as has his research on tutoring.

Moreover, it’s much better to try and build a policy regime from RCTs than from the weak theory that comes out of many education departments.

But, ultimately, I don’t think that (a) the RCTs covered in his study make a strong case for the scaling of his preferred interventions or (b) that RCTs can ever really tell us how to best design our public education systems.

I do think we should utilize RCTs to help schools make choices about which practices to adopt, but, ultimately, we should utilize theory and quasi-expermential evidence to handle the major public policy questions concerning education, which in mind have more to do with system structure than educational practice.

“We” (researchers, thought leaders, policy makers, etc.) shouldn’t be operationally scaling much; rather, we should be running experiments that give empowered educators and families more information to make great choices.

Four Ways to Unwind the Allure of Order


Note: the content below is probably better suited for a short book. But I tried to stuff it into a post. Clearly, much I’m still working through. Feedback would be great.


In his book the Allure of Order, Jal Mehta identifies two major problems with American education: (1) elites keep on initiating top-down accountability reforms that only lead to modest performance increases; and (2) the teaching profession has failed to professionalize into a field that self-regulates itself through codification of practice, pragmatic research that leads to performance improvements, and professional standards.

As I noted in a previous post, the problem here is that neither of these conditions appears to be changing anytime soon. Top-down annual testing has the political support of elites, civil rights leaders, and even union leadership. And numerous attempts to overhaul teacher preparation have for the most part been blocked by colleges of education.


How might we get out of this?

In considering different strategies, I tried to predict how a few key variables (which are embedded in Jal’s argument) might be impacted:

(1) Human capital: would the reforms increase our ability to recruit and develop excellent educators?

(2) Innovation: would the reforms increase our ability to experiment, research, and learn?

(3) Accountability: would the reforms increase elite trust in education so that top-down accountability might be loosened?

(4) Time: how long would it take to scale the reforms?


Based on the above variables, here’s the four strategies I came up with. Three of them entail moving away from government operation of schooling, and one does not.

(1) Nevada: Scale the Nevada education savings account model; basically: give every family an education debit card, put minimal restrictions on expenditures, and let the market work.

(2) The Non-Profit Flip: No city has tried this, but I’ve wondered about whether we should create a legislative framework that allowed cities to opt-in to 100% non-profit model. Basically, a state would allow cities to convert all their schools into non-profits over a set time period, say 2-3 years.

(3) Pump Charters: This would entail basically trying to maintain a 10% annual national charter growth rate over the next 25 years, which would get us to around 50% national charter market share.

(4) Fight For Finland: Alternatively, we could try and maintain government operation of schools but achieve what we’ve failed to achieve to date: a major increase in the quality of teacher recruitment and development and a loosening of accountability.


A brief analysis:

Strategy Better Human Capital? Increased Innovation? More Flexible Accountability? Time to Scale
Nevada Probably: for-profit incentives could attract a lot of talent, though likely more at management than teacher level. Yes: Putting funds in family hands will allow for entrepreneurs to create solutions for family needs. Yes, in Nevada reduced accountability (simple norm referenced reporting) is already baked into the model. Presumably, functioning markets could be created in a few years in most states.
The Flip

Not initially: a rapid switch to non-profit model would most likely utilize existing talent.

Not initially: given flip would be result of conversion rather than entrepreneurship, existing model likely to be prevalent. Long-term, new models could replace failed schools. Not likely: this reform would probably be based on portfolio style accountable and performance management.

Few years to convert schools to non-profits.

Pump Charters Yes: best charters have demonstrated ability to recruit and develop great educators; also already seeing codification (Relay), and research (partnering with Harvard, MIT, etc.) Yes: charters have been driving innovation (blended, diverse by design etc.), though more to be done here. Perhaps: if all schools were run by decent operators, elites might be more willing to loosen reigns. Probably 20-30 years.
Fight For Finland Unlikely: numerous calls to reform ed schools have failed, why will this time be different? No: existing government operated model has not led to much innovation; this won’t change. Unlikely: so long as the teaching force and schools feel and perform the same, elites will maintain demands. Probably 20-30 years to overhaul ed schools and influence elites (it took Finland decades).


Basically, you have two strategies that can work fast: Nevada and The Flip.

The upside of Nevada is that you get rapid deregulation, the conditions for a lot of innovation, and a baked in loosening of the reigns. The downside is that there is not an intentional human capital strategy, and there are a ton of risks in the deregulation going wrong.

The Flip gets you educator autonomy very quickly, but it does not intentionally focus on human capital pipelines or entrepreneurship – so while it sets the conditions for rapid change, it will not deliver it overnight. Moreover, given that all of these non-profits would need to be performance managed, it probably maintains need for heavy accountability.

Then you have two strategies that work slowly: Pump Charters and Fight For Finland.

Pump Charters is appealing in that: there is an explicit human capital strategy (alt providers, charters developing their own, Relay, etc.), is based on entrepreneurship (which will drive innovation), and, potentially, could build up enough trust to loosen accountability. If every school in a city was run by KIPP, Uncommon, Summit, and DSST – it’s not hard to imagine moving toward less testing, as there would be less of a need for constant monitoring. The downside here is that the strategy would take decades to even get to 50% market share, and the sector remains uneven in quality.

Fight For Finland is appealing in that it it is a path other countries have taken. Increasing human capital, increasing autonomy, and loosening accountability has worked elsewhere. Though, from what I gather, it takes countries decades to make these types of reforms. Additionally, I imagine they are much harder to accomplish in a large nation with decentralized governance of schools. We should take something from the fact that we’ve failed to accomplish this over the past 100 years of reform in our country.


In sum: it feels like Pumping Charters, with side bets like Nevada, might be the best way forward.

Pumping Charters has a twenty year history, and in Jal’s terms, it is thick (encompasses human capital, instruction, innovation, research, etc). I also think that Pumping Charters has an upside that is higher than Fight for Finland, though this of course remains unproven at scale.

Nevada is a high upside high risk bet, but if it works, we should double down on it.

So perhaps Pumping Charters should be the default path to push down until we can find a quicker method of reform (and we should keep making side bets while we’re Pumping Charters).

Of course, to the extent education schools get better, it helps all of these strategies. So while I remain skeptical that we’ll see any major changes soon, it seems like a side bet worth making as well.

Lastly, note that scaling high-performing charters and reforming are current system roughly work on the same time horizon here. So next time someone tells you we have to focus on districts because that’s where the kids are, tell this person that she is asking the wrong question.  The question is not: where are the children now? The question is: how long will it take to fix at scale?

Can We Unwind the Allure of Order and Safety?


I recently wrote about Jal Mehta’s excellent book: The Allure of Order.

The book’s title refers to elite attempts to improve public education via repeated cycles of standards and accountability based reforms.

I coined the phrase the Allure of Safety to describe another issue that Jal raises: teaching has not matured into a modern profession (one that is spurred forward by useful research, best practice standardization, and practitioner driven innovation and self-regulation).

I believe teachers have (intentionally or not) taken a bargain whereby they have traded increased professionalization for the safety of onerous union contracts and mutually beneficial relationships with bureaucracies.

If it is true that both the Allure of Order and Allure of Safety are preventing us all (citizens, educators, children) from having the schools we want – how could we walk back from these Allures?

I’m not sure.

Here’s the main issue I’m grappling with: I don’t know how we should sequence the unwindings.

Begin with Educators?

On one hand, you could argue that we need to begin with teacher recruitment and development, and that once these efforts are in place, we can begin to unwind top down mandates and put more trust in well developed talent.

But as David Steiner noted in his critique of Jal’s book, Jal doesn’t present a politically feasible and concrete path forward on this route. Even worse (for those who find this path appealing), Jal narrates in great detail a recent failed national effort to do just this.

Ben Riley and Deans for Impact are trying to make change here but have yet to prove that they can do so. Ditto for Hank Levin and his new effort.

Begin with Elites?

On the other hand, you could argue that we need to begin with the relaxing of top down accountability so as to create on-the-ground conditions that might foster increased partnership with educators.

But, as the current attempts to reauthorize NCLB are demonstrating, the elite consensus around annual testing (and other forms of top-down accountability) remains very much intact.

Moreover, removing top down accountability without any real reforms in educator recruitment and development might wash away the modest gains that accountability has appeared to deliver.

So What to Do?

I’ll try to tackle this in my next post on Jal’s book.

The Allure of Order: Book Review Part I


I just finished Jal Mehta’s The Allure of Order. Over the coming weeks, I’ll be blogging about the book.

The Allure of Order is an excellent book and should be a contender for education book of the year. Jal does an admirable job of deep historical analysis, policy criticism, and solution seeking. I imagine people on all sides of reform debates will find much to their liking. Do read it.

Here is how Jal frames why he wrote the book:

Screen Shot 2015-06-28 at 6.19.31 PM

Jal’s basic premise is that American education reform has suffered, in part, due to the combination of:

  1. America’s weak welfare state and an associated belief that schools can solve more problems than they probably can.
  2. The failure of the teaching profession (practitioners and researchers alike) to professionalize their field through rigorous research, standards of practice, and field advancements.
  3. The fact that our decentralized operational nature of education contributes to wide variations in quality.
  4. The ability of a diverse coalition of elites to exert moral power to demand increasingly centralized levels of standards and accountability over our decentralized school systems.

While it’s impossible to fully explain a hundred years of education history with a few broad strokes, these four conditions do seem to have a lot of explanatory power.

Of course, this analysis raises an important question: is a hundred years of standards and accountability reform the result of morally legitimate desire to inculcate high expectations, or is it the equivalent of saying the beatings will continue until morale improves?

Ultimately, it’s probably both, which helps explain why education is so decisive. In many ways, it pits a morally just vision (children, poor and minority included, can achieve!) against an exasperated field (how can we educators achieve this vision with poor training, little research, a weak welfare state, and dysfunctionally governed school systems)?

How to fix this?

The political knot seems to be this: elites seem unable to deliver what educators need (better training, practice focused research, real autonomy, and non-educational supports for children), and educators seem unable to let go of the institutions and values that protect but ultimately limit them (thousand page collective bargaining agreements and district bureaucracies).

In other words: while too many elites suffer from the Allure of Order, too many educators suffer from the Allure of Safety.

Together, the Allure of Order and the Allure of Safety seem to be at the heart of our educational problems.