Evidence and Policy: If You Want to Make a Silk Purse, Why Not Start With…Silk?

Everyone knows that you can’t make a silk purse out of a sow’s ear. This proverb goes back to the 1500s. Yet in education policy, we are constantly trying to achieve stellar results using school and classroom programs of unknown effectiveness, or even those known to be ineffective, even though proven effective programs are readily available.

Note that I am not criticizing teachers. They do the best they can with the tools they have. What I am concerned about is the quality of those tools, the programs, and professional development teachers receive to help them succeed with their children.

An excellent case in point was School Improvement Grants (SIG), a major provision of No Child Left Behind (NCLB). SIG provided major grants to schools scoring in the lowest 5% of their states. For most of its existence, SIG required schools seeking funding to choose among four models. Two of these, school closure and charterization, were rarely selected. Instead, most SIG schools selected either “turnaround” (replacing the principal and at least 50% of the staff), or the most popular, “transformation” (replacing the principal, using data to inform instruction, lengthening the school day or year, and evaluating teachers based on the achievement growth of their students). However, a major, large-scale evaluation of SIG by Mathematica showed no achievement benefits for schools that received SIG grants, compared to similar schools that did not. Ultimately, SIG spent more than $7 billion, an amount that we in Baltimore, at least, consider to be a lot of money. The tragedy, however, is not just the waste of so much money, but the dashing of so many hopes for meaningful improvement.

This is where the silk purse/sow’s ear analogy comes in. Each of the options among which SIG schools had to choose was composed of components that either lacked evidence of effectiveness or actually had evidence of ineffectiveness. If the components of each option are not known to be effective, then why would anyone expect a combination of them to be effective?

Evidence on school closure has found that this strategy diminishes student achievement for a few years, after which student performance returns to where it was before. Research on charter schools by CREDO (2013) has found an average effect size of zero for charters. The exception is “no-excuses” charters, such as KIPP and Success Academies, but these charters only accept students whose parents volunteer, not whole failing schools. Turnaround and transformation schools both require a change of principal, which introduces chaos and, as far as I know, has never been found to improve achievement. The same is true of replacing at least 50% of the teachers. Lots of chaos, no evidence of effectiveness. The other required elements of the popular “transformation” model have been found to have either no impact (e.g., benchmark assessments to inform teachers about progress; Inns et al., 2019), or small effects (e.g., lengthening the school day or year; Figlio et al., 2018). Most importantly, to blog_9-26-19_pig_500x336my knowledge, no one ever did a randomized evaluation of the entire transformation model, with all components included. We did not find out what the joint effect was until the Mathematica study. Guess what? Sewing together swatches of sows’ ears did not produce a silk purse. With a tiny proportion of $7 billion, the Department of Education could have identified and tested out numerous well-researched, replicable programs and then offered SIG schools a choice among the ones that worked best. A selection of silk purses, all made from 100% pure silk. Doesn’t that sound like a better idea?

In later blogs I’ll say more about how the federal government could ensure the success of educational initiatives by ensuring that schools have access to federal resources to adopt and implement proven programs designed to accomplish the goals of the legislation.

References

Figlio, D., Holden, K. L., & Ozek, U. (2018). Do students benefit from longer school days? Regression discontinuity evidence from Florida’s additional hour of literacy instruction. Economics of Education Review, 67, 171-183.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence Means Different Things in ESSA and NCLB

Whenever I talk or write about the new evidence standards in the Every Student Succeeds Act (ESSA), someone is bound to ask how this is different from No Child Left Behind (NCLB). Didn’t NCLB also emphasize using programs and practices “based on scientifically-based research?”

Though they look similar on the surface, evidence in ESSA is very different from evidence in NCLB. In NCLB, “scientifically-based research” just meant that a given program or practice was generally consistent with principles that had been established in research, and almost any program can be said to be “based on” research. In contrast, ESSA standards encourage the use of specific programs and practices that have themselves been evaluated. ESSA defines strong, moderate, and promising levels of evidence for programs and practices with at least one significantly positive outcome in a randomized, matched, or correlational study, respectively. NCLB had nothing of the sort.

To illustrate the difference, consider a medical example. In a recent blog, I told the story of how medical researchers had long believed that stress caused ulcers. Had NCLB’s evidence provision applied to ulcer treatment, all medicines and therapies based on reducing or managing stress, from yoga to tranquilizers, might be considered “based on scientifically based research” and therefore encouraged. Yet none of these stress-reduction treatments were actually proven to work; they were just consistent with current understandings about the origin of ulcers, which were wrong (bacteria, not stress, causes ulcers).

If ESSA were applied to ulcer treatment, it would demand evidence that a particular medicine or therapy actually improved or eliminated ulcers. ESSA evidence standards wouldn’t care whether a treatment was based on stress theory or bacteria theory, as long as there was good evidence that the actual treatment itself worked in practice, as demonstrated in high-quality research.

Getting back to education, NCLB’s “scientifically-based research” was particularly intended to promote the use of systematic phonics in beginning reading. There was plenty of evidence summarized by the National Reading Panel that a phonetic approach is a good idea, but most of that research was from controlled lab studies, small-scale experiments, and correlations. What the National Reading Panel definitely did not say was that any particular approach to phonics teaching was effective, only that phonics was a generically good idea.

One problem with NCLB’s “scientifically-based research” standard was that a lot of things go into making a program effective. One phonics program might provide excellent materials, extensive professional development, in-class coaching to help teachers use phonetic strategies, effective motivation strategies to get kids excited about phonics, effective grouping strategies to ensure that instruction is tailored to students’ needs, and regular assessments to keep track of students’ progress in reading. Another, equally phonetic program might teach phonics to students on a one-to-one basis. A third phonics program might consist of a textbook that comes with a free half-day training before school opens.

According to NCLB, all three of these approaches are equally “based on scientifically-based research.” But anyone can see that the first two, lots of PD and one-to-one tutoring, are way more likely to work. ESSA evidence standards insist that the actual approaches to be disseminated to schools be tested in comparison to control groups, not assumed to work because they correspond with accepted theory or basic research.

“Scientifically-based research” in NCLB was a major advance in its time, because it was the first time evidence had been mentioned so prominently in the main federal education law, yet educators soon learned that just about anything could be justified as “based on scientifically-based research,” because there are bound to be a few articles out there supporting any educational idea. Fortunately, enthusiasm about “scientifically-based” led to the creation of the Institute of Education Sciences (IES) and, later, to Investing in Innovation (i3), which set to work funding and encouraging development and rigorous evaluations of specific, replicable programs. The good work of IES and i3 paved the way for the ESSA evidence standards, because now there are a lot more rigorously evaluated programs. NCLB never could have specified ESSA-like evidence standards because there would have been too few qualifying programs. But now there are many more.

Sooner or later, policy and practice in education will follow medicine, agriculture, technology, and other fields in relying on solid evidence to the maximum degree possible. “Scientifically-based research” in NCLB was a first tentative step in that direction, and the stronger ESSA standards are another. If development and research continue or accelerate, successive education laws will have stronger and stronger encouragement and assistance to help schools and districts select and implement proven programs. Our kids will be the winners.

Evidence and the ESSA

The U.S. House of Representatives last week passed the new and vastly improved version of what is now being called the Every Student Succeeds Act (ESSA), the successor to No Child Left Behind (NCLB) and the Elementary and Secondary Education Act (ESEA). For people (such as me) who believe that evidence will provide salvation for education in our country, the House and Senate ESSA conference bill has a lot to like, especially in comparison to the earlier draft.

ESSA defines four categories of evidence based on their strength:

  1. “strong evidence” meaning supported by at least one randomized study;
  2. “moderate evidence” meaning supported by at least one quasi-experimental study;
  3. “promising evidence” meaning at least one correlational study with pretests as covariates; and
  4. programs with a rationale based on high-quality research or a positive evaluation that are likely to improve student or other relevant outcomes and that are undergoing evaluation, often referred to as “strong theory” (though the bill does not use that term).

The top three categories effectively constitute proven programs, as I read the law. For example, seven competitive funding programs would give preference points to applications with evidence meeting one of those categories, and a replacement for School Improvement Grants requires local educational agencies to include “evidence-based interventions” in their comprehensive support and improvement plans.

One good thing about this definition is that for the first time, it unequivocally conveys government recognition that not all forms of evaluation are created equal. Another is that it plants the idea that educators should be looking for proven programs, as defined by rigorous, sharp-edged standards. This is not new to readers of this blog, but is very new to most educators and policy makers.

Another positive feature of ESSA as far as evidence is concerned is that it includes a new tiered-evidence provision called Education Innovation Research (EIR) that would effectively replace the Investing in Innovation (i3) program. Like i3, it is a tiered grant program that will support the development, evaluation and scale-up of local, innovative education programs based on the level of evidence supporting the programs, but without the limitation of program priorities established by the U.S. Department of Education. It is a real relief to see Congress value continued development and evaluation of innovations in education.

Of course, there are also some potential problems, depending on how ESSA is administered. First, the definition for “evidence-based” includes correlational studies, and these are of lower quality than experiments. Worse, if “strong theory” is widely used, then the whole evidence effort may turn out to make no difference, as any program on Earth can be said to have “strong theory.”

A strong theme throughout ESSA is moving away from federal control of education toward state and local control. Philosophically, I have no problem with this, but it could cause trouble in the evidence movement, which has been largely focused on policy in Washington. These developments create a strong rationale for the evidence movement to expand its focus to state and local leaders, not just federal, and that would be a positive development in itself.

In education policy, it’s easy for well-meaning language to be watered down or disregarded in practice. Early on in NCLB, for example, evidence fans were excited by the 110 mentions of “scientifically-based research,” but “scientifically-based” was so loosely defined that it ended up changing very little in school practice (though it did lead to the creation of the Institute for Education Sciences, which mattered a great deal).

So recognizing that things could still go terribly wrong, I think it is nevertheless important to celebrate the potentially monumental achievement represented by ESSA. The evidence parts of the Act were certainly aided by the tireless efforts of numerous organizations that worked collectively to create scrupulously bipartisan coalitions in the House and Senate to support evidence in government. Just seeing both sides of the aisle and both sides of the Capitol collaborate in this crucial effort gives me hope that even in our polarized times, bipartisanship and bicameralism is still possible when children are involved. Congratulations to all who were responsible for this achievement.

Evidence-Based vs. Evidence-Proven

Way back in 2001, when we were all a lot younger and more naïve, Congress passed the No Child Left Behind Act (NCLB). It had all kinds of ideas in it, some better than others, but those of us who care about evidence were ecstatic about the often-repeated requirement that federal funds be used for programs “based on scientifically-based research (SBR),” particularly “based on scientifically-based reading research (SBRR).” SBR and SBRR were famously mentioned 110 times in the legislation.

The emphasis on research was certainly novel, and even revolutionary in many ways. It led to many positive actions. NCLB authorized the Institute for Education Sciences (IES), which has greatly increased the rigor and sophistication of research in education. IES and other agencies promoted training of graduate students in advanced statistical methods and supported the founding of the Society for Research in Educational Effectiveness (SREE), which has itself had considerable impact on rigorous research. The U.S. Department of Education has commissioned high-quality evaluations comparing a variety of interventions such as studies of computer-assisted instruction, early childhood curricula, and secondary reading programs. IES funded development and evaluation of numerous new programs, and the methodologies promoted by IES are essential to Investing in Innovation (i3), a larger effort focused on development and evaluation of promising programs in K-12 education.

The one serious limitation of the evidence movement up to the present is that while it has greatly improved research and methodology, it has not yet had much impact on practices in schools. Part of the problem is just that it takes time to build up enough of a rigorous evidence base to affect practice. However, another part of the problem is that from the outset, “scientifically-based research” was too squishy a concept. Programs or practices were said to be “based on scientifically-based research” if they generally went along with accepted wisdom, even if the specific approaches involved had never been evaluated. For example, “scientifically-based reading research” was widely interpreted to support any program that included the five elements emphasized in the 2000 National Reading Panel (NRP) report: phonemic awareness, phonics, vocabulary, comprehension, and fluency. Every reading educator and researcher knows this list, and most subscribe to it (and should do so). Yet since NCLB was enacted, National Assessment of Educational Progress reading scores have hardly budged, and evaluations of specific programs that just train teachers in the five NRP elements have had spotty outcomes, at best.

The problem with SBR/SBRR is that just about any modern instructional program can claim to incorporate the standards. “Based on…” is a weak standard, subject to anyone’s interpretation.

In contrast, government is beginning to specify levels of evidence far more specific than “based on scientifically-based research.” For example, the What Works Clearinghouse (WWC), the Education Department General Administrative Regulations (EDGAR), and i3 regulations have sophisticated definitions of proven programs. These typically require comparing a program to a control group, using fair and valid measures, appropriate statistical methods, and so on.

The more rigorous definitions of “evidence-proven” mean a great deal as education policies begin to encourage or provide incentives for schools to adopt proven programs. If programs only have to be “based on scientifically-based research,” then just about anything will qualify, and evidence will continue to make little difference in the programs children receive. If more stringent definitions of “evidence-proven” are used, there is a far greater chance that schools will be able to identify what really works and make informed choices among proven approaches.

Evidence-based and evidence-proven differ by just one word, but if evidence is truly to matter in policy, this is the word we have to get right.