Making Evidence Primary for Secondary Readers

In the wonderful movie Awakenings, Robin Williams plays a research neuroscientist who has run out of grants and therefore applies for a clinical job at a mental hospital. In the interview, the hospital’s director asks him about his research.

“I was trying to extract myelin from millions of earthworms,” he explains.

“But that’s impossible!” says the director.

“Yes, but now we know it’s impossible,” says Robin Williams’ character.

I recently had an opportunity to recall this scene. I was traveling back to Baltimore from Europe. Whenever I make this trip, I use the eight or so uninterrupted hours to do a lot of work. This time I was reading a giant stack of Striving Readers reports, because I am working with colleagues to update a review of research on secondary reading programs.

Striving Readers, part of Reading First, was a richly funded initiative of the George W. Bush administration that gave money to states to help them adopt intensive solutions for below-level readers in middle and high schools. The states implemented a range of programs, almost all of them commercial programs designed for secondary readers. To their credit, the framers of Striving Readers required rigorous third-party evaluations of whatever the states implemented, and those were the reports I was reading. Unfortunately, it apparently did not occur to anyone to suggest that the programs have their own evidence of effectiveness prior to being implemented and evaluated as part of Striving Readers.

As you might guess from the fact that I started off this blog post with the earthworm story, the outcomes are pretty dismal. A few of the studies found statistically significant impacts, but even those found very small effect sizes, and only on some but not other measures or subgroups.

I’m sure I and others will learn more as we get further into these reports, which are very high-quality evaluations with rich measures of implementation as well as outcomes. But I wanted to make one observation at this point.

Striving Readers was a serious, well-meaning attempt to solve a very important problem faced by far too many secondary students: difficulties with reading. I’m glad the Department of Education was willing to make such an investment. But next time anyone thinks of doing something on the scale of Striving Readers, I hope they will provide preference points in the application process for applicants who propose to use approaches with solid evidence of effectiveness. I also hope government will continue to fund development and evaluation of programs to address enduring problems of education, so that when they do start providing incentives for using proven programs, there will be many to choose from.

Just like the earthworm research in Awakenings, finding out conclusively what doesn’t work is a contribution to science. But in education, how many times do we have to learn what doesn’t work before we start supporting programs that we know do work? It’s time to recognize on a broad scale that programs proven to work in rigorous evaluations are more likely than other approaches to work again if implemented well in similar settings. Even earthworms learn from experience. Shouldn’t we do the same?

Accountability and Evidence

Illustration by James Bravo


At some level, just about everyone involved in education is in favor of “using what works.” There are plenty of healthy arguments about how we find out what works and how evidence gets translated into practice, but it’s hard to support a position that we shouldn’t use what works under at least some definition of evidence.

However, the dominant idea among policy makers about how we find out what works seems to be “Set up accountability systems and then learn from successful teachers, schools, systems, or states.” This sounds sensible, but in fact it is extremely difficult to do.

This point is made in a recent blog post by Tom Kane. Here’s a key section of his argument:

[In education] we tend to roll out reforms broadly, with no comparison group in mind, and hope for the best. Just imagine if we did that in health care. Suppose drug companies had not been required to systematically test drugs, such as statins, before they were marketed. Suppose drugs were freely marketed and the medical community simply stood back and monitored rates of heart disease in the population to judge their efficacy. Some doctors would begin prescribing them. Most would not. Even if the drugs were working, heart disease could have gone up or down, depending on other trends such as smoking and obesity. Two decades later, cardiologists would still be debating their efficacy. And age-adjusted death rates for heart disease would not have fallen by 60 percent [as they have] since 1980.

Kane was writing about big federal policies, such as Reading First and Race to the Top, which cannot be evaluated because they are national before their impact is known. But the same is true of smaller programs and practices. It is very difficult to look at, for example, more and less successful schools (on accountability measures) and figure out what they did that made the difference. Was it a particular program or practice that other schools could also adopt? Or was it that better-scoring schools were lucky in having better principals and teachers, or that the school’s intake or neighborhood is changing, or any number of other factors that may not even be stable for more than a year or two?

Accountability is necessary for communities to find out how students are doing. All countries have some test-based accountability (though none test every year, as we do from grades 3 through 8), but anyone who imagines that we can just look at test scores to find what works and what doesn’t is not being realistic.

The way we can find out what works is to compare schools or classrooms assigned to use any given program with those that continue current practices. Ideally, schools and classrooms are assigned at random to experimental or control groups. That’s how we find out what works in medicine, agriculture, technology, and other areas.

I know I’ve pointed this out in previous blog posts, and I’ll point it out in many to come. Sooner or later, it has to occur to our leaders that in education, too, we can use experiments to test good ideas before we subject millions of kids to something that will probably fail to improve their achievement. Again.