Evidence at Risk in House Spending Bill

The House Appropriations Committee marked up its spending bill yesterday for fiscal year 2016 for the Departments of Labor, Health and Human Services and Education. The spending levels in the bill put forward by the majority reduce Department of Education funding by $2.8 billion, mostly by eliminating approximately two dozen programs and severely cutting back several others, so it is no surprise that the bill passed through the Committee along party lines.

I can’t speak for all of the affected programs, but I do want to address what some of these proposed cuts could do. In a word, they would devastate the movement toward evidence as a basis for policy and practice in education.

First, the House bill would eliminate Investing in Innovation (i3). i3 has been the flagship for “tiered evidence” initiatives, providing large scale-up grants for programs that already have substantial evidence of effectiveness, smaller “validation” grants for programs with some evidence to build up their evidence base, and much smaller “development” grants for programs worth developing, piloting, and evaluating. At $120 million per year, i3 costs about 50¢ per taxpayer. What we get for 50¢ per year is a wide variety of promising programs at all grade levels and in all subjects, serving thousands of mostly high-poverty schools nationwide. We get evidence on the effectiveness of these programs, which tells us which are ready for broader use in our schools. The evidence from i3 informs the whole $630-billion public education enterprise, especially the $15-billion Title I program. That is, i3 costs 2¢ for every $100 spent on public education. Congresswoman Chellie Pingree of Maine offered an amendment to restore i3 and increase its funding level to $300 million, which is what the president had proposed. The process of offering the amendment gave members the opportunity to discuss the importance of i3, but in the end it was withdrawn (a not-uncommon procedural move when the amendment does not have an offset and/or is not expected to pass).

Second, the House proposal would significantly reduce funding for the Institute of Education Sciences (IES). IES commissions a wide variety of educational research, data collection, communications about evidence, and standard-setting for evidence, at a very modest cost. In this case, Congressman Mike Honda of California offered and withdrew an amendment to restore IES funding to its FY15 level of $574 million.

Finally, the main target of the proposed cuts was discretionary programs, which provide direct services to students. Districts, states, and other entities have to apply for these pots of money (as distinct from funds such as Title I or IDEA that are distributed by formula). Examples include Striving Readers (for struggling secondary readers); School Improvement Grants, or SIG (for low-performing schools); Preschool Development Grants; Mathematics and Science Partnerships; Ready to Learn (educational television); and several others.

These discretionary funding sources are the programs that could most easily be focused on evidence. One practical example is SIG, which recently added a category of approved expenditures consisting of whole-school reform programs with at least moderate evidence of effectiveness, which includes having been tested against a control group in at least one rigorous experiment. As another example, Title II SEED grants for professional development now require that programs adopted under SEED funding have at least moderate evidence of effectiveness. Congresswoman Rosa DeLauro offered an amendment to reinstate many of these programs, and it failed along party lines.

Adding evidence as a requirement or encouraging use of proven programs is much easier with discretionary programs than with formula grants. Yet if the House bill were to become law, there would be very few discretionary programs left.

The House proposal would greatly reduce national capacity to find out what works and what does not, and to scale up proven programs and practices. I very much hope our leaders in Congress will rethink this strategy and retain funding for the government programs mostly likely to help all of us learn — policy makers, educators, and kids alike.

Evidence-Based vs. Evidence-Proven

Way back in 2001, when we were all a lot younger and more naïve, Congress passed the No Child Left Behind Act (NCLB). It had all kinds of ideas in it, some better than others, but those of us who care about evidence were ecstatic about the often-repeated requirement that federal funds be used for programs “based on scientifically-based research (SBR),” particularly “based on scientifically-based reading research (SBRR).” SBR and SBRR were famously mentioned 110 times in the legislation.

The emphasis on research was certainly novel, and even revolutionary in many ways. It led to many positive actions. NCLB authorized the Institute for Education Sciences (IES), which has greatly increased the rigor and sophistication of research in education. IES and other agencies promoted training of graduate students in advanced statistical methods and supported the founding of the Society for Research in Educational Effectiveness (SREE), which has itself had considerable impact on rigorous research. The U.S. Department of Education has commissioned high-quality evaluations comparing a variety of interventions such as studies of computer-assisted instruction, early childhood curricula, and secondary reading programs. IES funded development and evaluation of numerous new programs, and the methodologies promoted by IES are essential to Investing in Innovation (i3), a larger effort focused on development and evaluation of promising programs in K-12 education.

The one serious limitation of the evidence movement up to the present is that while it has greatly improved research and methodology, it has not yet had much impact on practices in schools. Part of the problem is just that it takes time to build up enough of a rigorous evidence base to affect practice. However, another part of the problem is that from the outset, “scientifically-based research” was too squishy a concept. Programs or practices were said to be “based on scientifically-based research” if they generally went along with accepted wisdom, even if the specific approaches involved had never been evaluated. For example, “scientifically-based reading research” was widely interpreted to support any program that included the five elements emphasized in the 2000 National Reading Panel (NRP) report: phonemic awareness, phonics, vocabulary, comprehension, and fluency. Every reading educator and researcher knows this list, and most subscribe to it (and should do so). Yet since NCLB was enacted, National Assessment of Educational Progress reading scores have hardly budged, and evaluations of specific programs that just train teachers in the five NRP elements have had spotty outcomes, at best.

The problem with SBR/SBRR is that just about any modern instructional program can claim to incorporate the standards. “Based on…” is a weak standard, subject to anyone’s interpretation.

In contrast, government is beginning to specify levels of evidence far more specific than “based on scientifically-based research.” For example, the What Works Clearinghouse (WWC), the Education Department General Administrative Regulations (EDGAR), and i3 regulations have sophisticated definitions of proven programs. These typically require comparing a program to a control group, using fair and valid measures, appropriate statistical methods, and so on.

The more rigorous definitions of “evidence-proven” mean a great deal as education policies begin to encourage or provide incentives for schools to adopt proven programs. If programs only have to be “based on scientifically-based research,” then just about anything will qualify, and evidence will continue to make little difference in the programs children receive. If more stringent definitions of “evidence-proven” are used, there is a far greater chance that schools will be able to identify what really works and make informed choices among proven approaches.

Evidence-based and evidence-proven differ by just one word, but if evidence is truly to matter in policy, this is the word we have to get right.

Making Evidence Primary for Secondary Readers

In the wonderful movie Awakenings, Robin Williams plays a research neuroscientist who has run out of grants and therefore applies for a clinical job at a mental hospital. In the interview, the hospital’s director asks him about his research.

“I was trying to extract myelin from millions of earthworms,” he explains.

“But that’s impossible!” says the director.

“Yes, but now we know it’s impossible,” says Robin Williams’ character.

I recently had an opportunity to recall this scene. I was traveling back to Baltimore from Europe. Whenever I make this trip, I use the eight or so uninterrupted hours to do a lot of work. This time I was reading a giant stack of Striving Readers reports, because I am working with colleagues to update a review of research on secondary reading programs.

Striving Readers, part of Reading First, was a richly funded initiative of the George W. Bush administration that gave money to states to help them adopt intensive solutions for below-level readers in middle and high schools. The states implemented a range of programs, almost all of them commercial programs designed for secondary readers. To their credit, the framers of Striving Readers required rigorous third-party evaluations of whatever the states implemented, and those were the reports I was reading. Unfortunately, it apparently did not occur to anyone to suggest that the programs have their own evidence of effectiveness prior to being implemented and evaluated as part of Striving Readers.

As you might guess from the fact that I started off this blog post with the earthworm story, the outcomes are pretty dismal. A few of the studies found statistically significant impacts, but even those found very small effect sizes, and only on some but not other measures or subgroups.

I’m sure I and others will learn more as we get further into these reports, which are very high-quality evaluations with rich measures of implementation as well as outcomes. But I wanted to make one observation at this point.

Striving Readers was a serious, well-meaning attempt to solve a very important problem faced by far too many secondary students: difficulties with reading. I’m glad the Department of Education was willing to make such an investment. But next time anyone thinks of doing something on the scale of Striving Readers, I hope they will provide preference points in the application process for applicants who propose to use approaches with solid evidence of effectiveness. I also hope government will continue to fund development and evaluation of programs to address enduring problems of education, so that when they do start providing incentives for using proven programs, there will be many to choose from.

Just like the earthworm research in Awakenings, finding out conclusively what doesn’t work is a contribution to science. But in education, how many times do we have to learn what doesn’t work before we start supporting programs that we know do work? It’s time to recognize on a broad scale that programs proven to work in rigorous evaluations are more likely than other approaches to work again if implemented well in similar settings. Even earthworms learn from experience. Shouldn’t we do the same?

It’s Proven. It’s Perfect. I’ll Change It.

I recently visited Kraków, Poland. It’s a wonderful city. One of its highlights is a beautiful royal castle, built in the 16th century by an Italian architect. The castle had just one problem. It had no interior hallways. To go from room to room, you had to go outside onto a covered walkway overlooking a courtyard. This is a perfectly good idea in warm Italy, but in Poland it can get to 30 below in the winter!

In evidence-based reform in education, we have a related problem. As proven programs become more important in policy and practice, many educators ask whether programs proven in one place (say, warm Florida) will work in another (say, cold Minnesota). In fact, many critics of evidence-based reform base their criticism on the idea that every school and every context is different, so it is impossible to have programs that can apply across all schools.

Obviously, the best answer to this problem is to test promising programs in many places, until we can say either that they work across a broad range of circumstances or that there are key context-based limiting variables. While the evidence may not yet (or ever) be definitive, it is worthwhile to use common sense about what factors might limit generalizability and which are unlikely to do so. For example, for indoor activities such as teaching, hot and cold climates probably do not matter. Rural versus urban locations might matter a great deal for parent involvement programs or attendance programs or after school programs, where families’ physical proximity to the school and transportation issues are likely to be important. English learners certainly need accommodations to their needs that other children may not. Other ethnic-group or social class differences may impact the applicability of particular programs in particular settings. But especially for classroom instructional approaches, it will most often be the case that kids are kids, schools are schools, and effective is effective. Programs that are effective with one broad set of schools and students are likely to be effective in other similar settings. Programs that work in urban Title I schools mainly teaching native English-speaking students in several locations are likely to be effective in similar settings nationally, and so on.

Yet many educators, even those who believe in evidence, are willing to adopt proven programs, but then immediately want to change them, often in major ways. This is usually a very bad idea. The research field is full of examples of programs that consistently work when implemented as intended, but fail miserably when key elements are altered or completely left out. Unless there are major, clear reasons why changes must be made, it is best to implement programs as they were when they achieved their positive outcomes. Over time, as schools become familiar with a program, school leaders and teachers might discuss revisions with the program developer and implement sensible changes in line with the model’s theory of action and evidence base.

Faithful replication is important for obvious reasons, namely sticking as close as possible to the factors that made the original program effective. However, there is a less obvious reason that replications should be as true as possible to the original, at least in the first year or early years of implementation. The reason is because when educators complain about a new program “taking away their creativity,” they are often in fact looking for ways to keep doing what they have always done. And if educators do what they have always done, they will get what they have always gotten, as Einstein noted.

Innovation within proven programs can be a good thing, when schools have fully embraced and thoroughly understand a given program and now can see where it can be improved or adapted to their circumstances. However, innovation too early in replication is likely to turn the best of innovations into mush.

It is perfectly fair for school districts, schools and/or teachers to examine the evidence supporting a new approach to judge just how robust that evidence is: has the program proved itself across a reasonable range of school environments not radically unlike their own? But if the answer to that question is yes, then fidelity of implementation should be the guiding principle of adopting the new program.

Kraków’s castle should have had interior halls to adapt to the cold Polish winters. However, if everyone’s untested ideas about palace design were thrown into the mix from the outset, the palace might never have stood up in the first place!