Half a Worm: Why Education Policy Needs High Evidence Standards

There is a very old joke that goes like this:

What’s the second-worst thing to find in your apple?  A worm.

What’s the worst?  Half a worm.

The ESSA evidence standards provide clearer definitions of “strong,” “moderate,” and “promising” levels of evidence than have ever existed in law or regulation. Yet they still leave room for interpretation.  The problem is that if you define evidence-based too narrowly, too few programs will qualify.  But if you define evidence-based too broadly, it loses its meaning.

We’ve already experienced what happens with a too-permissive definition of evidence.  In No Child Left Behind, “scientifically-based research” was famously mentioned 110 times.  The impact of this, however, was minimal, as everyone soon realized that the term “scientifically-based” could be applied to just about anything.

Today, we are in a much better position than we were in 2002 to insist on relatively strict evidence of effectiveness, both because we have better agreement about what constitutes evidence of effectiveness and because we have a far greater number of programs that would meet a high standard.  The ESSA definitions are a good consensus example.  Essentially, they define programs with “strong evidence of effectiveness” as those with at least one randomized study showing positive impacts using rigorous methods, and “moderate evidence of effectiveness” as those with at least one quasi-experimental study.  “Promising” is less well-defined, but requires at least one correlational study with a positive outcome.

Where the half-a-worm concept comes in, however, is that we should not use a broader definition of “evidence-based”.  For example, ESSA has a definition of “strong theory.”  To me, that is going too far, and begins to water down the concept.  What program in all of education cannot justify a “strong theory of action”?

Further, even in the top categories, there are important questions about what qualifies. In school-level studies, should we insist on school-level analyses (i.e., HLM)? Every methodologist would say yes, as I do, but this is not specified. Should we accept researcher-made measures? I say no, based on a great deal of evidence indicating that such measures inflate effects.

Fortunately, due to investments made by IES, i3, and other funders, the number of programs that meet strict standards has grown rapidly. Our Evidence for ESSA website (www.evidenceforessa.org) has so far identified 101 PK-12 reading and math programs, using strict standards consistent with ESSA definitions. Among these, more than 60% meet the “strong” standard. There are enough proven programs in every subject and grade level to give educators choices among proven programs. And we add more each week.

This large number of programs meeting strict evidence standards means that insisting on rigorous evaluations, within reason, does not mean that we end up with too few programs to choose among. We can have our apple pie and eat it, too.

I’d love to see federal programs of all kinds encouraging use of programs with rigorous evidence of effectiveness.  But I’d rather see a few programs that meet a strict definition of “proven” than to see a lot of programs that only meet a loose definition.  20 good apples are much better than applesauce of dubious origins!

This blog is sponsored by the Laura and John Arnold Foundation


Proven Tutoring Approaches: The Path to Universal Proficiency

There are lots of problems in education that are fundamentally difficult. Ensuring success in early reading, however, is an exception. We know what skills children need in order to succeed in reading. No area of teaching has a better basis in high-quality research. Yet the reading performance of America’s children is not improving at an adequate pace. Reading scores have hardly changed in the past decade, and gaps between white, African-American, and Hispanic students have been resistant to change.
In light of the rapid growth in the evidence base, and of the policy focus on early reading at the federal and state levels, this is shameful. We already know a great deal about how to improve early reading, and we know how to learn more. Yet our knowledge is not translating into improved practice and improved outcomes on a large enough scale.
There are lots of complex problems in education, and complex solutions. But here’s a really simple solution:


Over the past 30 years researchers have experimented with all sorts of approaches to improve students’ reading achievement. There are many proven and promising classroom approaches, and such programs should be used with all students in initial teaching as broadly as possible. Effective classroom instruction, universal access to eyeglasses, and other proven approaches could surely reduce the number of students who need tutors. But at the end of the day, every child must read well. And the only tool we have that can reliably make a substantial difference at scale with struggling readers is tutors, using proven one-to-one or small-group methods.

I realized again why tutors are so important in a proposal I’m making to the State of Maryland, which wants to bring all or nearly all students to “proficient” on its state test, the PARCC. “Proficient” on the PARCC is a score of 750, with a standard deviation of about 50. The state mean is currently around 740. I made a colorful chart (below) showing “bands” of scores below 750 to show how far students have to go to get to 750.


Each band covers an effect size of 0.20. There are several classroom reading programs with effect sizes this large, so if schools adopted them, they could move children scoring at 740 to 750. These programs can be found at www.evidenceforessa.org. But implementing these programs alone still leaves half of the state’s children not reaching “proficient.”

What about students at 720? They need 30 points, or +0.60. The best one-to-one tutoring can achieve outcomes like this, but these are the only solutions that can.

Here are mean effect sizes for various reading tutoring programs with strong evidence:



As this chart shows, one-to-one tutoring, by well-trained teachers or paraprofessionals using proven programs, can potentially have the impacts needed to bring most students scoring 720 (needing 30 points or an effect size of +0.60) to proficiency (750). Three programs have reported effect sizes of at least +0.60, and several others have approached this level. But what about students scoring below 720?

So far I’ve been sticking to established facts, studies of tutoring that are, in most cases, already being disseminated. Now I’m entering the region of well-justified supposition. Almost all studies of tutoring occupy just one year or less. But what if the lowest achievers could receive multiple years of tutoring, if necessary?

One study, over 2½ years, did find an effect size of +0.68 for one-to-one tutoring. Could we do better that that? Most likely. In addition to providing multiple years of tutoring, it should be possible to design programs to achieve one-year effect sizes of +1.00 or more. These may incorporate technology or personalized approaches specific to the needs of individual children. Using the best programs for multiple years, if necessary, could increase outcomes further. Also, as noted earlier, using proven programs other than tutoring for all students may increase outcomes for students who also receive tutoring.

But isn’t tutoring expensive? Yes it is. But it is not as expensive as the costs of reading failure: Remediation, special education, disappointment, and delinquency. If we could greatly improve the reading performance of low achievers, this would of course reduce inequities across the board. Reducing inequities in educational outcomes could reduce inequities in our entire society, an outcome of enormous importance.

Even providing a substantial amount of teacher tutoring could, by my calculations, increase total state education expenditures (in Maryland) by only about 12%. These costs could be reduced greatly or even eliminated by reducing expenditures on ineffective programs, reducing special education placements, and other savings. Having some tutoring done by part time teachers may reduce costs. Using small-group tutoring (fewer than 6 students at a time) for students with milder problems may save a great deal of money. Even at full cost, the necessary funding could be phased in over a period of 6 years at 2% a year.

The bottom line is that the low levels of achievement and high levels of gaps according to economic and racial differences could be improved a great deal using methods already proven to be effective and already widely available. Educators and policy makers are always promising policies that bring every child to proficiency: “No Child Left Behind” and “Every Student Succeeds” come to mind. Yet if these outcomes are truly possible, why shouldn’t we be pursuing them, with every resource at our disposal?

How Networks of Proven Programs Could Help State-Level Reform

America is a great country, but it presents a serious problem for school reformers. The problem is that it is honkin’ humongous, with strong traditions of state and local autonomy. Reforming even a single state is a huge task, because most of our states are the size of entire small nations. (My small state, Maryland, has about the population of Scotland, for example.) And states, districts, schools, and teachers are all kind of prickly about taking orders from anyone further up the hierarchy.

The Every Student Succeeds Act (ESSA) puts a particular emphasis on state and local control, a relief after the emphasis on mandates from Washington central to No Child Left Behind. ESSA also contains a welcome focus on using evidence-based programs.

ESSA is new, and state, district and school leaders are just now grappling with how to use the ESSA opportunities to move forward on a large scale. How can states hope to bring about major change on a large scale, working one school at a time?

The solution to this problem might be for states, large districts, or coalitions of smaller districts to offer a set of proven, whole school reform models to a number of schools in need of assistance, such as Title I schools. School leaders and their staffs would have opportunities to learn about programs, find some appropriate to their needs, ideally visit schools using the programs now, and match the programs with their own needs, derived from a thorough needs assessment. Ultimately, all school staff might vote, and at least 80% would have to vote in favor. The state or district would set aside federal or state funds to enable schools to afford the program they have chosen.

All schools in the state, district, or consortium that selected a given program could then form a network. The network would have regular meetings among principals, teachers of similar grades, and other job-alike staff members, to provide mutual help, share ideas, and interact cost-effectively with representatives of program providers. Network members would share a common language, and drawing from common experiences could be of genuine help to each other. The network arrangement would also reduce the costs of adopting each program, because it would create local scale to reduce costs of training and coaching.

The benefits of such a plan would be many. First, schools would be implementing programs they selected, and school staffs would be likely to put their hearts and minds into making the program work. Because the programs would all have been proven to be effective in the first place, they would be very likely to be measurably effective in these applications.

There might be schools that would initially opt not to choose anything, and this would be fine. Such schools would have opportunities each year to join colleagues in one of the expanding networks as they see that the programs are working in their own districts or regions.

As the system moved forward, it would become possible to do high-quality evaluations of each of the programs, contributing to knowledge of how each program works in particular districts or areas.

As the number of networked schools increased across a given state, it would begin to see widespread and substantial gains on state assessments. Further, all involved in this process would be learning not only the average effectiveness of each program, but also how to make each one work, and how to use programs to succeed with particular subgroups or solve particular problems. Networks, program leaders, and state, district, and school leaders, would get smarter each year about how to use proven programs to accelerate learning among students.

How could this all work at scale? The answer is that there are nonprofit organizations and companies that are already capable of working with hundreds of schools. At the elementary level, examples include the Children’s Literacy Initiative, Positive Action, and our own Success for All. At the secondary level, examples include BARR, the Talent Development High School, Reading Apprenticeship, and the Institute for Student Achievement. Other programs currently work with specific curricula and could partner with other programs to provide whole-school approaches, or some schools may only want or need to work on narrower problems. The programs are not that expensive at scale (few are more than $100 per student per year), and could be paid for with federal funds such as school improvement, Title I, Title II, and Striving Readers, or with state or local funds.

The proven programs do not ask schools to reinvent the wheel, but rather to put their efforts and resources toward adopting and effectively implementing proven programs and then making necessary adaptations to meet local needs and circumstances. Over time this would build capacity within each state, so that local people could take increasing responsibility for training and coaching, further reducing costs and increasing local “flavor.”

We’ve given mandates 30 years to show their effectiveness. ESSA offers new opportunities to do things differently, allowing states and districts greater freedom to experiment. It also strongly encourages the use of evidence. This would be an ideal time to try a simple idea: use what works.

This blog is sponsored by the Laura and John Arnold Foundation

Where Will the Capacity for School-by-School Reform Come From?

In recent months, I’ve had a number of conversations with state and district leaders about implementing the ESSA evidence standards. To its credit, ESSA diminishes federal micromanaging, and gives more autonomy to states and locals, but now that the states and locals are in charge, how are they going to achieve greater success? One state department leader described his situation in ESSA as being like that of a dog who’s been chasing cars for years, and then finally catches one. Now what?

ESSA encourages states and local districts to help schools adopt and effectively implement proven programs. For school improvement, portions of Title II, and Striving Readers, ESSA requires use of proven programs. Initially, state and district folks were worried about how to identify proven programs, though things are progressing on that front (see, for example, www.evidenceforessa.org). But now I’m hearing a lot more concern about capacity to help all those individual schools do needs assessments, select proven programs aligned with their needs, and implement them with thought, care, and knowledgeable application of implementation science.

I’ve been in several meetings where state and local folks ask federal folks how they are supposed to implement ESSA. “Regional educational labs will help you!” they suggest. With all due respect to my friends in the RELs, this is going to be a heavy lift. There are ten of them, in a country with about 52,000 Title I schoolwide projects. So each REL is responsible for, on average, five states, 1,400 districts, and 5,200 high-poverty schools. For this reason, RELs have long been primarily expected to work with state departments. There are just not enough of them to serve many individual districts, much less schools.

State departments of education and districts can help schools select and implement proven programs. For example, they can disseminate information on proven programs, make sure that recommended programs have adequate capacity, and perhaps hold effective methods “fairs” to introduce people in their state to program providers. But states and districts rarely have capacity to implement proven programs themselves. It’s very hard to build state and local capacity to support specific proven programs. For example, due to frequent downturns in state or district funding come, the first departments to be cut back or eliminated often involve professional development. For this reason, few state departments or districts have large, experienced professional development staffs. Further, constant changes in state and local superintendents, boards, and funding levels, make it difficult to build up professional development capacity over a period of years.

Because of these problems, schools have often been left to make up their own approaches to school reform. This happened on a wide scale in the NCLB School Improvement Grants (SIG) program, where federal mandates specified very specific structural changes but left the essentials, teaching, curriculum, and professional development, up to the locals. The MDRC evaluation of SIG schools found that they made no better gains than similar, non-SIG schools.

Yet there is substantial underutilized capacity available to help schools across the U.S. to adopt proven programs. This capacity resides in the many organizations (both non-profit and for-profit) that originally created the proven programs, provided the professional development that caused them to meet the “proven” standard, and likely built infrastructure to ensure quality, sustainability, and growth potential.

The organizations that created proven programs have obvious advantages (their programs are known to work), but they also have several less obvious advantages. One is that organizations built to support a specific program have a dedicated focus on that program. They build expertise on every aspect of the program. As they grow, they hire capable coaches, usually ones who have already shown their skills in implementing or leading the program at the building level. Unlike states and districts that often live in constant turmoil, reform organizations or for-profit professional development organizations are likely to have stable leadership over time. In fact, for a high-poverty school engaged with a program provider, that provider and its leadership may be the only partner stable enough to be likely to be able to help them with their core teaching for many years.

State and district leaders play major roles in accountability, management, quality assurance, and personnel, among many other issues. With respect to implementation of proven programs, they have to set up conditions in which schools can make informed choices, monitor the performance of provider organizations, evaluate outcomes, and ensure that schools have the resources and supports they need. But truly reforming hundreds of schools in need of proven programs one at a time is not realistic for most states and districts, at least not without help. It makes a lot more sense to seek capacity in organizations designed to provide targeted professional development services on proven programs, and then coordinate with these providers to ensure benefits for students.

This blog is sponsored by the Laura and John Arnold Foundation