Accountability and Evidence

Illustration by James Bravo


At some level, just about everyone involved in education is in favor of “using what works.” There are plenty of healthy arguments about how we find out what works and how evidence gets translated into practice, but it’s hard to support a position that we shouldn’t use what works under at least some definition of evidence.

However, the dominant idea among policy makers about how we find out what works seems to be “Set up accountability systems and then learn from successful teachers, schools, systems, or states.” This sounds sensible, but in fact it is extremely difficult to do.

This point is made in a recent blog post by Tom Kane. Here’s a key section of his argument:

[In education] we tend to roll out reforms broadly, with no comparison group in mind, and hope for the best. Just imagine if we did that in health care. Suppose drug companies had not been required to systematically test drugs, such as statins, before they were marketed. Suppose drugs were freely marketed and the medical community simply stood back and monitored rates of heart disease in the population to judge their efficacy. Some doctors would begin prescribing them. Most would not. Even if the drugs were working, heart disease could have gone up or down, depending on other trends such as smoking and obesity. Two decades later, cardiologists would still be debating their efficacy. And age-adjusted death rates for heart disease would not have fallen by 60 percent [as they have] since 1980.

Kane was writing about big federal policies, such as Reading First and Race to the Top, which cannot be evaluated because they are national before their impact is known. But the same is true of smaller programs and practices. It is very difficult to look at, for example, more and less successful schools (on accountability measures) and figure out what they did that made the difference. Was it a particular program or practice that other schools could also adopt? Or was it that better-scoring schools were lucky in having better principals and teachers, or that the school’s intake or neighborhood is changing, or any number of other factors that may not even be stable for more than a year or two?

Accountability is necessary for communities to find out how students are doing. All countries have some test-based accountability (though none test every year, as we do from grades 3 through 8), but anyone who imagines that we can just look at test scores to find what works and what doesn’t is not being realistic.

The way we can find out what works is to compare schools or classrooms assigned to use any given program with those that continue current practices. Ideally, schools and classrooms are assigned at random to experimental or control groups. That’s how we find out what works in medicine, agriculture, technology, and other areas.

I know I’ve pointed this out in previous blog posts, and I’ll point it out in many to come. Sooner or later, it has to occur to our leaders that in education, too, we can use experiments to test good ideas before we subject millions of kids to something that will probably fail to improve their achievement. Again.


Accountability for the Top 95 Percent


Perhaps the most controversial issue in education policy is test-based accountability. Since the 1980s, most states have had tests in reading and math (at least), and have used average school test scores for purposes ranging from praising or embarrassing school staffs to providing financial incentives or closing down low-scoring schools. Test-based accountability became national with NCLB, which required annual testing from grades 3-8, and prescribed sanctions for low-achieving schools. The Obama administration added to this an emphasis on using student test scores as part of teacher evaluations.

The entire test-based accountability movement has paid little attention to evidence. In fact, in 2011, the National Research Council reviewed research on high-stakes accountability and found few benefits.

There’s nothing wrong with testing students and identifying schools in which students appear to be making good or poor progress in comparison to other schools serving students with similar backgrounds, as long as this is just used as information to identify areas of need. What is damaging about accountability is the use of test scores for draconian consequences, such as firing principals and closing schools. The problem is that terror is just not a very good strategy for professional development. Teachers and principals afraid of punishment are more likely to use questionable strategies to raise their scores—teaching the test, reducing time on non-tested subjects, trying to attract higher-achieving kids or get rid of lower performers, not to mention out-and-out cheating. Neither terror nor the hope of rewards does much to fundamentally improve day to day teaching because the vast majority of teachers are already doing their best. There are bad apples, and they need to be rooted out. But you can’t improve the overall learning of America’s children unless you improve daily teaching practices for the top 95% of teachers, the ones who come to work every day, do their best, care about their kids, and go home dead tired.

Improving outcomes for the students of the top 95% requires top-quality, attractive, engaging professional development to help teachers use proven programs and practices. Because people are more likely to take seriously professional development they’ve chosen, teachers should have choices (as a school or department, primarily) of which proven programs they want to adopt and implement.

The toughest accountability should be reserved for the programs themselves, and the organizations that provide them. Teachers and principals should have confidence that if they do adopt a given program and implement it with fidelity and intelligence, it will work. This is best demonstrated in large experiments in which teachers in many schools use innovative programs, and outcomes are compared with similar schools without the programs. They should know that they’ll get enough training and coaching to see that the program will work.

Offering a broad range of proven programs would give local schools and districts
expanded opportunities to make wise choices for their children. Just as evidence in agriculture informs but does not force choices by farmers, evidence in education should enable school leaders to advance children’s learning in a system of choice, not compulsion.

If schools had choices among many proven programs, in all different subjects (tested as well as untested), the landscape of accountability would change. Instead of threatening teachers and principals, government could provide help for schools to adopt programs they want and need. Offering proven programs provides a means of improving outcomes even in untested areas, such as science, social studies, and foreign language. As time goes on, more and better programs with convincing evaluation evidence would appear, because developers and funders would perceive the need for them.

Moving to a focus on evidence-based reform will not solve all of the contentious issues about accountability, but it could help us focus the reform conversation on how to move forward the top 95% of teachers and schools—the ones who teach 95% of our kids—and how to put accountability in proper proportion.