Extraordinary Gains: Making Them Last

One of the great frustrations of evidence-based reform in education is that while we do have some interventions that have a strong impact on students’ learning, these outcomes usually fade over time. The classic example is intensive, high-quality preschool programs. There is no question about the short-term impacts of quality preschool, but after fifty years, the Perry Preschool study remains the only case in which a randomized experiment found long-term positive impacts of preschool. I think the belief in the Perry Preschool’s long-term impacts conditioned many of us to expect amazing long-term impacts of early interventions of all kinds, but the Perry Preschool evaluation was flawed in several ways, and later randomized studies such as the Tennessee Voluntary Prekindergarten Program do not find such lasting impacts. There have been similar difficulties documenting long-term impacts of the Reading Recovery tutoring program. I have been looking at research on summer school (Neitzel et al., 2020), and found a few summer programs for kindergarteners and first graders that had exceptional impacts on end-of-summer reading effects, but these had faded by the following spring.

A little coaching can go a long way.

Advocates for these and other intensive interventions frequently express an expectation that resource-intensive interventions at key developmental turning points can transform the achievement trajectories of students performing below grade level or otherwise at risk. Many educators and researchers believe that after successful early intervention, students can participate in regular classroom teaching and will continue to advance with their agemates. However, for many students, this is unlikely.  For example, imagine a struggling third grade girl reading at the first grade level. After sixteen weeks of daily 30-minute tutoring, she has advanced to grade level reading. However, after finishing her course of tutoring, the girl may experience slow progress. She will probably not forget what she has learned, but other students, who reached grade level reading without tutoring, may make more rapid progress than she does, because whatever factors caused her to be two years below grade level in the third grade may continue to slow her progress even after tutoring succeeds. By sixth grade, without continuing intervention, she might be well below grade level again, perhaps better off than she would have been without tutoring, but not at grade level.

But what if we knew, as the evidence clearly suggests, that one year of Perry Preschool or 60 lessons of Reading Recovery or seven weeks of intensive reading summer school was not sufficient to ensure long-lasting gains in achievement? What could we do to see that successful investments in intensive early interventions are built upon in subsequent years, so that formerly at-risk students not only maintain what they learned, but continue afterwards to make exceptional gains?

Clearly, we could build on early gains by continuing to provide intensive intervention every year, if that is what is needed, but that would be extremely expensive. Instead, imagine that each school had within it a small group of teachers and teacher assistants, whose job was to provide initial tutoring for students at risk, and then to monitor students’ progress and to strategically intervene to keep students on track. For the moment, I’ll call them an Excellence in Learning Team (XLT). This team would keep close track of the achievement of all at-risk and formerly at-risk students on frequent assessments, at least in reading and math. These staff members would track students’ trajectories toward grade level performance. If students fall off of that trajectory, members of the XLT would provide tutoring to the students, as long as necessary. My assumption is that a student who made brilliant progress with 60 tutoring sessions, for example, would not need another 60 sessions each year to stay on track toward grade level, but that perhaps 10 or 20 sessions would be sufficient.

 The XLT would need effective, targeted tools to quickly and efficiently help students whose progress is stumbling. For example, XLT tutors might have available computer-assisted tutoring modules to assist students who, for example, mastered phonics, but are having difficulty with fluency, or multi-syllabic words, or comprehension of narrative or factual text. In mathematics, they might have specific computer-assisted tutoring modules on place value, fractions, or word problems. The idea is precision and personalization, so that the time of every XLT member is used to maximum effect. From the students’ perspective, assistance from the XLT is not a designation (like special or remedial education), but rather time-limited assistance to enable all students to achieve ambitious and challenging goals.

XLT, would be most effective, I believe, if students have started with intensive tutoring, intensive summer school, or other focused interventions that can bring about rapid progress. This is essential early in students’ progression. Rapid progress at the outset not only sets students up for success, in an academic sense, but it also convinces the student and his or her teachers that he or she is capable of extraordinary progress. Such confidence is crucial.

As an analogy to what I am describing here, consider how you cook a stew. You first bring the stew to a boil, and then simmer for a long time. If you only brought the stew to a boil and then turned off the stove, the stew would never cook. If you only set the stove on simmer, but did not first bring the stew to a boil, it might take hours to cook, if it ever did. It is the sequence of intense energy followed by less intense but lengthy support that does the job. Or consider a rocket to the moon, which needs enormous energy to reach escape velocity, followed by continued but less intense energy to complete the trip.  In education, high-quality preschool or tutoring or intensive summer school can play the part of the boil, but this needs to be followed by long-term, lower-intensity, precisely targeted support.

I would love to see a program of research designed to figure out how to implement long-term support to enable at-risk students to experience rapid success and then build on that success for many years. This is how we will finally leverage our demonstrated ability to make big differences in intensive early intervention, by linking it to multi-year, life-changing services that ensure students’ success in the long term, where it really matters.


Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (2020). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at *www.bestevidence.org. Manuscript submitted for publication. *This new review of research on elementary programs for struggling readers had to be taken down because it is under review at a journal.  For a copy of the current draft, contact Amanda Neitzel (aneitzel@jhu.edu).

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Evidence Affects School Change and Teacher-by-Teacher Change Differently

Nell Duke, now a distinguished professor at the University of Michigan, likes to tell a story about using cooperative learning as a young teacher. She had read a lot about cooperative learning and was excited to try it in her elementary class. However, not long after she started, her principal came to her class and asked her to step into the hall. “Miss Duke,” he said, “what in blazes are you doing in there?”

Nell told her principal all about cooperative learning, and how strongly the research supported it, and how her students were so excited to work in groups and help each other learn.

“Cooperative learning?” said her principal. “Well, I suppose that’s all right. But from now on could you do it quietly?”

Nell Duke’s story exemplifies one of the most important problems in research-based reform in education. Should research-based reform focus on teachers or on schools? Nell was following the evidence, and her students were enjoying the new method and seemed to be learning better because of it. Yet in her school, she was the only teacher using cooperative learning. As a result, she did not have the support or understanding of her principal, or even of her fellow teachers. Her principal had rules about keeping noise levels down, and he was not about to make an exception for one teacher.

However, the problem of evidence-based reform for teachers as opposed to schools goes far beyond the problems of one noisy classroom. The problem is that it is difficult to do reform one teacher at a time. In fact, it is very difficult to even do high-quality program evaluations at the teacher level, and as a result, most programs listed as effective in the What Works Clearinghouse or Evidence for ESSA are designed for use at least in whole grade levels, and often in whole schools. One reason for this is that it is more cost-effective to provide coaching to whole schools or grade levels. Most successful programs provide initial professional development to many teachers and then follow up with coaching visits to teachers using new methods, to give them feedback and encouragement. It is too expensive for most schools to provide extensive coaching to just one or a small number of teachers. Further, multiple teachers working together can support each other, ask each other questions, and visit each other’s classes. Principals and other administrative staff can support the whole school in using proven programs, but a principal responsible for many teachers is not likely to spend a lot of time learning about a method used by just one or two teachers.


When we were disseminating cooperative learning programs in the 1980s, we started off providing large workshops for anyone who wanted to attend. These were very popular and teachers loved them, but when we checked in a year later, many teachers were not using the methods they’d learned. Why? The answer was most often that teachers had difficulty sustaining a new program without much support from their leadership or colleagues. We’d found that on-site coaching was essential for quality implementation, but we could not provide coaching to widely dispersed schools. Instead, we began to focus on school-wide implementations of cooperative learning. This soon led to our development and successful evaluations of Success for All, as we learned that working with whole schools made it possible not only to ensure high-quality implementations of cooperative learning, but also to add in grouping strategies, tutoring for struggling readers, parent involvement approaches, and other elements that would have been impossible to do in a teacher-by teacher approach to change.

In comparison with our experience with cooperative learning focused on individual teachers, Success for All has both been more effective and longer-lasting. The median Success for All school has used the program for 11 years, for example.

Of course, it is still important to have research-based strategies that teachers can use on their own. Cooperative learning itself can be used this way, as can proven strategies for classroom management, instruction, assessment, feedback, and much more. Yet it is often the case that practices suggested to individual teachers were in fact evaluated in whole school or grade levels. It is probably better for teachers to use programs proven effective in school-level research than to use unevaluated approaches, but teachers using such programs on their own should be aware that teachers in school-level evaluations probably received a lot of professional development and in-class coaching. To get the same results, individual teachers might visit others using the programs successfully, or at a minimum participate in social media conversations with other teachers using the same approaches.

Individual teachers interested in using proven programs and practices might do best to make common cause with colleagues and approach the principal about trying the new method in their grade level or in the school as a whole. This way, it is possible to obtain the benefits of school-wide implementation while playing an active role in the process of innovation.

There are never guarantees in any form of innovation, but teachers who are eager to improve their teaching and their students’ learning can work with receptive principals to systematically try out and informally evaluate promising approaches. Perhaps nothing would have changed the mind of Nell Duke’s principal, but most principals value initiative on the part of their teachers to try out likely solutions to improve students’ learning.

The numbers of children who need proven programs to reach their full potential is vast. Whenever possible, shouldn’t we try to reach larger numbers of students with well-conceived and well-supported implementations of proven teaching methods?

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Why Can’t Education Progress Like Medicine Does?

I recently saw an end-of-year article in The Washington Post called “19 Good Things That Happened in 2019.” Four of them were medical or public health breakthroughs. Scientists announced a new therapy for cystic fibrosis likely to benefit 90% of people with this terrible disease, incurable for most patients before now. The World Health Organization announced a new vaccine to prevent Ebola. The Bill and Melinda Gates Foundation announced that deaths of children before their fifth birthday have now dropped from 82 per thousand births in 1990 to 37 in 2019. The Centers for Disease Control reported a decline of 5.1 percent in deaths from drug overdoses in just one year, from 2017 to 2018.

Needless to say, breakthroughs in education did not make the list. In fact, I’ll bet there has never been an education breakthrough mentioned on such lists.

blog_1-9-20_kiddoctor_337x500 I get a lot of criticism from all sides for comparing education to medicine and public health. Most commonly, I’m told that it’s ever so much easier to give someone a pill than to change complex systems of education. That’s true enough, but not one of the 2019 medical or public health breakthroughs was anything like “taking a pill.” The cystic fibrosis cure involves a series of three treatments personalized to the genetic background of patients. It took decades to find and test this treatment. A vaccine for Ebola may be simple in concept, but it also took decades to develop. Also, Ebola occurs in very poor countries, where ensuring universal coverage with a vaccine is very complex. Reducing deaths of infants and toddlers took massive coordinated efforts of national governments, international organizations, and ongoing research and development. There is still much to do, of course, but the progress made so far is astonishing. Similarly, the drop in deaths due to overdoses required, and still requires, huge investments, cooperation between government agencies of all sorts, and constant research, development, and dissemination. In fact, I would argue that reducing infant deaths and overdose deaths strongly resemble what education would have to do to, for example, eliminate reading failure or enable all students to succeed at middle school mathematics. No one distinct intervention, no one miracle pill has by itself improved infant mortality or overdose mortality, and solutions for reading and math failure will similarly involve many elements and coordinated efforts among many government agencies, private foundations, and educators, as well as researchers and developers.

The difference between evidence-based reform in medicine/public health and education is, I believe, a difference in societal commitment to solving the problems. The general public, especially political leaders, tend to be rather complacent about educational failures. One of our past presidents said he wanted to help, but said, “We have more will than wallet” to solve educational problems. Another focused his education plans on recruiting volunteers to help with reading. These policies hardly communicate seriousness. In contrast, if medicine or public health can significantly reduce death or disease, it’s hard to be complacent.

Perhaps part of the motivational difference is due to the situations of powerful people. Anyone can get a disease, so powerful individuals are likely to have children or other relatives or friends who suffer from a given disease. In contrast, they may assume that children failing in school have inadequate parents or parents who need improved job opportunities or economic security or decent housing, which will take decades, and massive investments to solve. As a result, governments allocate little money for research, development, or dissemination of proven programs.

There is no doubt in my mind that we could, for example, eliminate early reading failure, using the same techniques used to eliminate diseases: research, development, practical experiments, and planful, rapid scale-up. It’s all a question of resources, political leadership, collaboration among many critical agencies and individuals, and a total commitment to getting the job done. The year reading failure drops to near zero nationwide, perhaps education will make the Washington Post list of “50 Good Things That Happened in 2050.”

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence-Based Reform and the Multi-Academy Trust

Recently, I was in England to visit Success for All (SFA) schools there. I saw two of the best SFA schools I’ve ever seen anywhere, Applegarth Primary School in Croyden, south of London, and Houldsworth Primary School in Sussex, southeast of London. Both are very high-poverty schools with histories of poor achievement, violence, and high staff turnover. Applegarth mostly serves the children of African immigrants, and Houldsworth mostly serves White students from very poor homes. Yet I saw every class in each school and in each one, children were highly engaged, excited, and learning like crazy. Both schools were once in the lowest one percent of achievement in England, yet both are now performing at or above national norms.

In my travels, I often see outstanding Success for All schools. However, in this case I learned about an important set of policies that goes beyond Success for All, but could have implications for evidence-based reform more broadly.


Both Applegarth and Houldsworth are in multi-academy trusts (MATs), the STEP Trust and the Unity Trust, respectively. Academies are much like charter schools in the U.S., and multi-academy trusts are organizations that run more than one academy. Academies are far more common in the U.K. than the U.S., constituting 22% of primary (i.e., elementary) schools and 68% of secondary schools. There are 1,170 multi-academy trusts, managing more than 5,000 of Britain’s 32,000 schools, or 16%. Multi-academy trusts can operate within a single local authority (school district) (like Success Academies in New York City) or may operate in many local authorities. Quite commonly, poorly-performing schools in a local authority, or stand-alone academies, may be offered to a successful and capable multi-academy trust, and these hand-overs explain much of the growth in multi-academy trusts in recent years.

What I saw in the STEP and Unity Trusts was something extraordinary. In each case, the exceptional schools I saw were serving as lead schools for the dissemination of Success for All. Staff in these schools had an explicit responsibility to train and mentor future principals, facilitators, and teachers, who spend a year at the lead school learning about SFA and their role in it, and then taking on their roles in a new SFA school elsewhere in the multi-academy trust. Over time, there are multiple lead schools, each of which takes responsibility to mentor new SFA schools other than their own. This cascading dissemination strategy, carried out in close partnership with the national SFA-UK non-profit organization, is likely to produce exceptional implementations.

I’m sure there must be problems with multi-academy trusts that I don’t know about, and in the absence of data on MATs throughout Britain, I would not take a position on them in general. But based on my limited experience with the STEP and Unity Trusts, this policy has particular potential as a means of disseminating very effective forms of programs proven effective in rigorous research.

First, multi-academy trusts have the opportunity and motivation to establish themselves as effective. Ordinary U.S. districts want to do well, of course, but they do not grow (or shrink) because of their success (or lack of it). In contrast, a multi-academy trust in the U.K. is more likely to seek out proven programs and implement them with care and competence, both to increase student success and to establish a “brand” based on their effective use of proven programs. Both STEP and Unity Trusts are building a reputation for succeeding with difficult schools using methods known to be effective. Using cascading professional developing and mentoring from established schools to new ones, a multi-academy trust can build effectiveness and reputation.

Although the schools I saw were using Success for All, any multi-academy trust could use any proven program or programs to create positive outcomes and expand its reach and influence. As other multi-academy trusts see what the pioneers are accomplishing, they may decide to emulate them. One major advantage possessed by multi-academy trusts is that much in contrast to U.S. school districts, especially large, urban ones, multi-academy trusts are likely to remain under consistent leadership for many years. Leaders of multi-academy trusts, and their staff and supporters, are likely to have time to transform practices gradually over time, knowing that they have the stable leadership needed for long-term change.

There is no magic in school governance arrangements, and no guarantee that many multi-academy trusts will use the available opportunities to implement and perfect proven strategies. Yet by their nature, multi-academy trusts have the opportunity to make a substantial difference in the education provided to all students, especially those serving disadvantaged students. I look forward to watching plans unfold in the STEP and Unity Trusts, and to learn more about how the academy movement in the U.K. might provide a path toward widespread and thoughtful use of proven programs, benefiting very large numbers of students. And I’d love to see more U.S. charter networks and traditional school districts use cascading replication to scale up proven, whole-school approaches likely to improve outcomes in disadvantaged schools.

Photo credit: Kindermel [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

On Replicability: Why We Don’t Celebrate Viking Day

I was recently in Oslo, Norway’s capital, and visited a wonderful museum displaying three Viking ships that had been buried with important people. The museum had all sorts of displays focused on the amazing exploits of Viking ships, always including the Viking landings in Newfoundland, about 500 years before Columbus. Since the 1960s, most people have known that Vikings, not Columbus, were the first Europeans to land in America. So why do we celebrate Columbus Day, not Viking Day?

Given the bloodthirsty actions of Columbus, easily rivaling those of the Vikings, we surely don’t prefer one to the other based on their charming personalities. Instead, we celebrate Columbus Day because what Columbus did was far more important. The Vikings knew how to get back to Newfoundland, but they were secretive about it. Columbus was eager to publicize and repeat his discovery. It was this focus on replication that opened the door to regular exchanges. The Vikings brought back salted cod. Columbus brought back a new world.

In educational research, academics often imagine that if they establish new theories or demonstrate new methods on a small scale, and then publish their results in reputable journals, their job is done. Call this the Viking model: they got what they wanted (promotions or salt cod), and who cares if ordinary people found out about it? Even if the Vikings had published their findings in the Viking Journal of Exploration, this would have had roughly the same effect as educational researchers publishing in their own research journals.

Columbus, in contrast, told everyone about his voyages, and very publicly repeated and extended them. His brutal leadership ended with him being sent back to Spain in chains, but his discoveries had resounding impacts that long outlived him.


Educational researchers only want to do good, but they are unlikely to have any impact at all unless they can make their ideas useful to educators. Many educational researchers would love to make their ideas into replicable programs, evaluate these programs in schools, and if they are found to be effective, disseminate them broadly. However, resources for the early stages of development and research are scarce. Yes, the Institute of Education Sciences (IES) and Education Innovation Research (EIR) fund a lot of development projects, and Small Business Innovation Research (SBIR) provides small grants for this purpose to for-profit companies. Yet these funders support only a tiny proportion of the proposals they receive. In England, the Education Endowment Foundation (EEF) spends a lot on randomized evaluations of promising programs, but very little on development or early-stage research. Innovations that are funded by government or other funding very rarely end up being evaluated in large experiments, fewer still are found to be effective, and vanishingly few eventually enter widespread use. The exceptions are generally programs crated by large for-profit companies, large and entrepreneurial non-profits, or other entities with proven capacity to develop, evaluate, support, and disseminate programs at scale. Even the most brilliant developers and researchers rarely have the interest, time, capital, business expertise, or infrastructure to nurture effective programs through all the steps necessary to bring a practical and effective program to market. As a result, most educational products introduced at scale to schools come from commercial publishers or software companies, who have the capital and expertise to create and disseminate educational programs, but serve a market that primarily wants attractive, inexpensive, easy-to-use materials, software, and professional development, and is not (yet) willing to pay for programs proven to be effective. I discussed this problem in a recent blog on technology, but the same dynamics apply to all innovations, tech and non-tech alike.

How Government Can Promote Proven, Replicable Programs

There is an old saying that Columbus personified the spirit of research. He didn’t know where he was going, he didn’t know where he was when he got there, and he did it all on government funding. The relevant part of this is the government funding. In Columbus’ time, only royalty could afford to support his voyage, and his grant from Queen Isabella was essential to his success. Yet Isabella was not interested in pure research. She was hoping that Columbus might open rich trade routes to the (east) Indies or China, or might find gold or silver, or might acquire valuable new lands for the crown (all of these things did eventually happen). Educational research, development, and dissemination face a similar situation. Because education is virtually a government monopoly, only government is capable of sustained, sizable funding of research, development, and dissemination, and only the U.S. government has the acknowledged responsibility to improve outcomes for the 50 million American children ages 4-18 in its care. So what can government do to accelerate the research-development-dissemination process?

  1. Contract with “seed bed” organizations capable of identifying and supporting innovators with ideas likely to make a difference in student learning. These organizations might be rewarded, in part, based on the number of proven programs they are able to help create, support, and (if effective) ultimately disseminate.
  2. Contract with independent third-party evaluators capable of doing rigorous evaluations of promising programs. These organizations would evaluate promising programs from any source, not just from seed bed companies, as they do now in IES, EIR, and EEF grants.
  3. Provide funding for innovators with demonstrated capacity to create programs likely to be effective and funding to disseminate them if they are proven effective. Developers may also contract with “seed bed” organizations to help program developers succeed with development and dissemination.
  4. Provide information and incentive funding to schools to encourage them to adopt proven programs, as described in a recent blog on technology.  Incentives should be available on a competitive basis to a broad set of schools, such as all Title I schools, to engage many schools in adoption of proven programs.

Evidence-based reform in education has made considerable progress in the past 15 years, both in finding positive examples that are in use today and in finding out what is not likely to make substantial differences. It is time for this movement to go beyond its early achievements to enter a new phase of professionalism, in which collaborations among developers, researchers, and disseminators can sustain a much faster and more reliable process of research, development, and dissemination. It’s time to move beyond the Viking stage of exploration to embrace the good parts of the collaboration between Columbus and Queen Isabella that made a substantial and lasting change in the whole world.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do School Districts Really Have Difficulty Meeting ESSA Evidence Standards?

The Center for Educational Policy recently released a report on how school districts are responding to the Every Student Succeeds Act (ESSA) requirement that schools seeking school improvement grants select programs that meet ESSA’s strong, moderate, or promising standards of evidence. Education Week ran a story on the CEP report.

The report noted that many states, districts, and schools are taking the evidence requirements seriously, and are looking at websites and consulting with researchers to help them identify programs that meet the standards. This is all to the good.

However, the report also notes continuing problems districts and schools are having finding out “what works.” Two particular problems were cited. One was that districts and schools were not equipped to review research to find out what works. The other was that rural districts and schools found few programs proven effective in rural schools.

I find these concerns astounding. The same concerns were expressed when ESSA was first passed, in 2015. But that was almost four years ago. Since 2015, the What Works Clearinghouse has added information to help schools identify programs that meet the top two ESSA evidence categories, strong and moderate. Our own Evidence for ESSA, launched in February, 2017, has up-to-date information on virtually all PK-12 reading and math programs currently in dissemination. Among hundreds of programs examined, 113 meet ESSA standards for strong, moderate, or promising evidence of effectiveness. WWC, Evidence for ESSA, and other sources are available online at no cost. The contents of the entire Evidence for ESSA website were imported into Ohio’s own website on this topic, and dozens of states, perhaps all of them, have informed their districts and schools about these sources.

The idea that districts and schools could not find information on proven programs if they wanted to do so is difficult to believe, especially among schools eligible for school improvement grants. Such schools, and the districts in which they are located, write a lot of grant proposals for federal and state funding. The application forms for school improvement grants always explain the evidence requirements, because that is the law. Someone in every state involved with federal funding knows about the WWC and Evidence for ESSA websites. More than 90,000 unique users have used Evidence for ESSA, and more than 800 more sign on each week.


As to rural schools, it is true that many studies of educational programs have taken place in urban areas. However, 47 of the 113 programs qualified by Evidence for ESSA were validated in at least one rural study, or a study including a large enough rural sample to enable researchers to separately report program impacts for rural students. Also, almost all widely disseminated programs have been used in many rural schools. So rural districts and schools that care about evidence can find programs that have been evaluated in rural locations, or at least that were evaluated in urban or suburban schools but widely disseminated in rural schools.

Also, it is important to note that if a program was successfully evaluated only in urban or suburban schools, the program still meets the ESSA evidence standards. If no studies of a given outcome were done in rural locations, a rural school in need of better outcomes could, in effect, be asked to choose between a program proven to work somewhere and probably used in dissemination in rural schools, or they could choose a program not proven to work anywhere. Every school and district has to make the best choices for their kids, but if I were a rural superintendent or principal, I’d read up on proven programs, and then go visit some rural schools using that program nearby. Wouldn’t you?

I have no reason to suspect that the CEP survey is incorrect. There are many indications that district and school leaders often do feel that the ESSA evidence rules are too difficult to meet. So what is really going on?

My guess is that there are many district and school leaders who do not want to know about evidence on proven programs. For example, they may have longstanding, positive relationships with representatives of publishers or software developers, or they may be comfortable and happy with the materials and services they are already using, evidence-proven or not. If they do not have evidence of effectiveness that would pass muster with WWC or Evidence for ESSA, the publishers and software developers may push hard on state and district officials, put forward dubious claims for evidence (such as studies with no control groups), and do their best to get by in a system that increasingly demands evidence that they lack. In my experience, district and state officials often complain about having inadequate staff to review evidence of effectiveness, but their concern may be less often finding out what works as it is defending themselves from publishers, software developers, or current district or school users of programs, who maintain that they have been unfairly rated by WWC, Evidence for ESSA, or other reviews. State and district leaders who stand up to this pressure may have to spend a lot of time reviewing evidence or hearing arguments.

On the plus side, at the same time that publishers and software producers may be seeking recognition for their current products, many are also sponsoring evaluations of some of their products that they feel are mostly likely to perform well in rigorous evaluations. Some may be creating new programs that resemble programs that have met evidence standards. If the federal ESSA law continues to demand evidence for certain federal funding purposes, or even to expand this requirement to additional parts of federal grant-making, then over time the ESSA law will have its desired effect, rewarding the creation and evaluation of programs that do meet standards by making it easier to disseminate such programs. The difficulties the evidence movement is experiencing are likely to diminish over time as more proven programs appear, and as federal, state, district, and school leaders get comfortable with evidence.

Evidence-based reform was always going to be difficult, because of the amount of change it entails and the stakes involved. But sooner or later, it is the right thing to do, and leaders who insist on evidence will see increasing levels of learning among their students, at minimal cost beyond what they already spend on untested or ineffective approaches. Medicine went through a similar transition in 1962, when the U.S. Congress first required that medicines be rigorously evaluated for effectiveness and safety. At first, many leaders in the medical profession resisted the changes, but after a while, they came to insist on them. The key is political leadership willing to support the evidence requirement strongly and permanently, so that educators and vendors alike will see that the best way forward is to embrace evidence and make it work for kids.

Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence and Policy: If You Want to Make a Silk Purse, Why Not Start With…Silk?

Everyone knows that you can’t make a silk purse out of a sow’s ear. This proverb goes back to the 1500s. Yet in education policy, we are constantly trying to achieve stellar results using school and classroom programs of unknown effectiveness, or even those known to be ineffective, even though proven effective programs are readily available.

Note that I am not criticizing teachers. They do the best they can with the tools they have. What I am concerned about is the quality of those tools, the programs, and professional development teachers receive to help them succeed with their children.

An excellent case in point was School Improvement Grants (SIG), a major provision of No Child Left Behind (NCLB). SIG provided major grants to schools scoring in the lowest 5% of their states. For most of its existence, SIG required schools seeking funding to choose among four models. Two of these, school closure and charterization, were rarely selected. Instead, most SIG schools selected either “turnaround” (replacing the principal and at least 50% of the staff), or the most popular, “transformation” (replacing the principal, using data to inform instruction, lengthening the school day or year, and evaluating teachers based on the achievement growth of their students). However, a major, large-scale evaluation of SIG by Mathematica showed no achievement benefits for schools that received SIG grants, compared to similar schools that did not. Ultimately, SIG spent more than $7 billion, an amount that we in Baltimore, at least, consider to be a lot of money. The tragedy, however, is not just the waste of so much money, but the dashing of so many hopes for meaningful improvement.

This is where the silk purse/sow’s ear analogy comes in. Each of the options among which SIG schools had to choose was composed of components that either lacked evidence of effectiveness or actually had evidence of ineffectiveness. If the components of each option are not known to be effective, then why would anyone expect a combination of them to be effective?

Evidence on school closure has found that this strategy diminishes student achievement for a few years, after which student performance returns to where it was before. Research on charter schools by CREDO (2013) has found an average effect size of zero for charters. The exception is “no-excuses” charters, such as KIPP and Success Academies, but these charters only accept students whose parents volunteer, not whole failing schools. Turnaround and transformation schools both require a change of principal, which introduces chaos and, as far as I know, has never been found to improve achievement. The same is true of replacing at least 50% of the teachers. Lots of chaos, no evidence of effectiveness. The other required elements of the popular “transformation” model have been found to have either no impact (e.g., benchmark assessments to inform teachers about progress; Inns et al., 2019), or small effects (e.g., lengthening the school day or year; Figlio et al., 2018). Most importantly, to blog_9-26-19_pig_500x336my knowledge, no one ever did a randomized evaluation of the entire transformation model, with all components included. We did not find out what the joint effect was until the Mathematica study. Guess what? Sewing together swatches of sows’ ears did not produce a silk purse. With a tiny proportion of $7 billion, the Department of Education could have identified and tested out numerous well-researched, replicable programs and then offered SIG schools a choice among the ones that worked best. A selection of silk purses, all made from 100% pure silk. Doesn’t that sound like a better idea?

In later blogs I’ll say more about how the federal government could ensure the success of educational initiatives by ensuring that schools have access to federal resources to adopt and implement proven programs designed to accomplish the goals of the legislation.


Figlio, D., Holden, K. L., & Ozek, U. (2018). Do students benefit from longer school days? Regression discontinuity evidence from Florida’s additional hour of literacy instruction. Economics of Education Review, 67, 171-183.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Why Not the Best?

In 1879, Thomas Edison invented the first practical lightbulb. The main problem he faced was in finding a filament that would glow, but not burn out too quickly. To find it, he tried more than 6000 different substances that had some promise as filaments. The one he found was carbonized cotton, which worked far better than all the others (tungsten, which we use now, came much later).

Of course, the incandescent light changed the world. It replaced far more expensive gas lighting systems, and was much more versatile. The lightbulb captured the evening and nighttime hours for every kind of human activity.

blog_9-19-19_lightbulb_500x347Yet if the lightbulb had been an educational innovation, it probably would have been proclaimed a dismal failure. Skeptics would have noted that only one out of six thousand filaments worked. Meta-analysts would have averaged the effect sizes for all 6000 experiments and concluded that the average effect size across the 6000 filaments was only +0.000000001. Hardly worthwhile. If Edison’s experiments were funded by government, politicians would have complained that 5,999 of Edison’s filaments were a total waste of taxpayers’ money. Economists would have computed benefit-cost ratios and concluded that even if Edison’s light worked, the cost of making the first one was astronomical, not to mention the untold cost of setting up electrical generation and wiring systems.

This is all ridiculous, you must be saying. But in the world of evidence-based education, comparable things happen all the time. In 2003, Borman et al. did a meta-analysis of 300 studies of 29 comprehensive (whole-school) reform designs. They identified three as having solid evidence of effectiveness. Rather than celebrating and disseminating those three (and continuing research and development to identify more of them), the U.S. Congress ended its funding for dissemination of comprehensive school reform programs. Turn out the light before you leave, Mr. Edison!

Another common practice in education is to do meta-analyses averaging outcomes across an entire category of programs or policies, and ignoring the fact that some distinctively different and far more effective programs are swallowed up in the averages. A good example is charter schools. Large-scale meta-analyses by Stanford’s CREDO (2013) found that the average effect sizes for charter schools are effectively zero. A 2015 analysis found better, but still very small effect sizes in urban districts (ES = +0.04 in reading, +0.05 in math). The What Works Clearinghouse published a 2010 review that found slight negative effects of middle school charters. These findings are useful in disabusing us of the idea that charter schools are magic, and get positive outcomes just because they are charter schools. However, they do nothing to tell us about extraordinary charter schools using methods that other schools (perhaps including non-charters) could also use. There is more positive evidence relating to “no-excuses” schools, such as KIPP and Success Academies, but among the thousands of charters that now exist, is this the only type of charter worth replicating? There must be some bright lights among all these bulbs.

As a third example, there are now many tutoring programs used in elementary reading and math with struggling learners. The average effect sizes for all forms of tutoring average about +0.30, in both reading and math. But there are reading tutoring approaches with effect sizes of +0.50 or more. If these programs are readily available, why would schools adopt programs less effective than the best? The average is useful for research purposes, and there are always considerations of costs and availability, but I would think any school would want to ignore the average for all types of programs and look into the ones that can do the most for their kids, at a reasonable cost.

I’ve often heard teachers and principals point out that “parents send us the best kids they have.” Yes they do, and for this reason it is our responsibility as educators to give those kids the best programs we can. We often describe educating students as enlightening them, or lifting the lamp of learning, or fiat lux. Perhaps the best way to fiat a little more lux is to take a page from Edison, the great luxmeister: Experiment tirelessly until we find what works. Then use the best we have.


Borman, G.D., Hewes, G. M., Overman, L.T., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 73(2), 125-230.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.


The Gap

Recently, Maryland released its 2019 state PARCC scores.  I read an article about the scores in the Baltimore Sun.  The pattern of scores was the same as usual, some up, some down. Baltimore City was in last place, as usual.  The Sun helpfully noted that this was probably due to high levels of poverty in Baltimore.  Then the article noted that there was a serious statewide gap between African American and White students, followed by the usual shocked but resolute statements about closing the gap from local superintendents.

Some of the superintendents said that in order to combat the gap, they were going to take a careful look at the curriculum.  There is nothing wrong with looking at curriculum.  All students should receive the best curriculum we can provide them.  However, as a means of reducing the gap, changing the curriculum is not likely to make much difference.

First, there is plentiful evidence from rigorous studies showing that changing from one curriculum to another, or one textbook to another, or one set of standards to another, makes little difference in student achievement.  Some curricula have more interesting or up to date content than others. Some meet currently popular standards better than others. But actual meaningful increases in achievement compared to a control group using the old curriculum?  This hardly ever happens. We once examined all of the textbooks rated “green” (the top ranking on EdReports, which reviews textbooks for alignment with college- and career-ready standards). Out of dozens of reading and math texts with this top rating,  two had small positive impacts on learning, compared to control groups.  In contrast, we have found more than 100 reading and math programs that are not textbooks or curricula that have been found to significantly increase student achievement more than control groups using current methods (see www.evidenceforessa.org).

But remember that at the moment, I am talking about reducing gaps, not increasing achievement overall.  I am unaware of any curriculum, textbook, or set of standards that is proven to reduce gaps. Why should they?  By definition, a curriculum or set of standards is for all students.  In the rare cases when a curriculum does improve achievement overall, there is little reason to expect it to increase performance for one  specific group or another.

The way to actually reduce gaps is to provide something extremely effective for struggling students. For example, the Sun article on the PARCC scores highlighted Lakeland Elementary/Middle, a Baltimore City school that gained 20 points on PARCC since 2015. How did they do it? The University of Maryland, Baltimore County (UMBC) sent groups of undergraduate education majors to Lakeland to provide tutoring and mentoring.  The Lakeland kids were very excited, and apparently learned a lot. I can’t provide rigorous evidence for the UMBC program, but there is quite a lot of evidence for similar programs, in which capable and motivated tutors without teaching certificates work with small groups of students in reading or math.

Tutoring programs and other initiatives that focus on the specific kids who are struggling have an obvious link to reducing gaps, because they go straight to where the problem is rather than doing something less targeted and less intensive.


Serious gap-reduction approaches can be used with any curriculum or set of standards. Districts focused on standards-based reform may also provide tutoring or other proven gap-reduction approaches along with new textbooks to students who need them.  The combination can be powerful. But the tutoring would most likely have worked with the old curriculum, too.

If all struggling students received programs effective enough to bring all of them to current national averages, the U.S. would be the highest-performing national school system in the world.  Social problems due to inequality, frustration, and inadequate skills would disappear. Schools would be happier places for kids and teachers alike.

The gap is a problem we can solve, if we decide to do so.  Given the stakes involved for our economy, society, and future, how could we not?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

The Farmer and the Moon Rocks: What Did the Moon Landing Do For Him?

Many, many years ago, during the summer after my freshman year in college, I hitchhiked from London to Iran.  This was the summer of 1969, so Apollo 11 was also traveling.   I saw television footage of the moon landing in Heraklion, Crete, where a television store switched on all of its sets and turned them toward the sidewalk.  A large crowd watched the whole thing.  This was one of the few times I recall when it was really cool to be an American abroad.

After leaving Greece, I went on to Turkey, and then Iran.  In Teheran, I got hold of an English-language newspaper.  It told an interesting story.  In rural Iran, many people believed that the moon was a goddess.  Obviously, a spaceship cannot land on a goddess, so many people concluded that the moon landing must be a hoax.

A reporter from the newspaper interviewed a number of people about the moon landing.  Some were adamant that the landing could not have happened.  However, one farmer was more pragmatic.  He asked the reporter, “I hear the astronauts brought back moon rocks.  Is that right?”

“That’s what they say!” replied the reporter.

“I am fixing my roof, and I could sure use a few of those moon rocks.  Do you think they might give me some?”

blog_8-1-19_moonfarmer_500x432 (002)

The moon rock story illustrates a daunting problem in the dissemination of educational research. Researchers do high-quality research on topics of great importance to the practice of education. They publish this research in top journals, and get promotions and awards for it, but in most cases, their research does not arouse even the slightest bit of interest among the educators for whom it was intended.

The problem relates to the farmer repairing his roof.  He had a real problem to solve, and he needed help with it.  A reporter comes and tells him about the moon landing. The farmer does not think, “How wonderful!  What a great day for science and discovery and the future of mankind!”  Instead, he thinks, “What does this have to do with me?”  Thinking back on the event, I sometimes wonder if he really expected any moon rocks, or if he was just sarcastically saying, “I don’t care.”

Educators care deeply about their students, and they will do anything they can to help them succeed.  But if they hear about research that does not relate to their children, or at least to children like theirs, they are unlikely to care very much.  Even if the research is directly applicable to their students, they are likely to reason, perhaps from long experience, that they will never get access to this research, because it costs money or takes time or upsets established routines or is opposed by powerful groups or whatever.  The result is status quo as far as the eye can see, or implementation of small changes that are currently popular but unsupported by evidence of effectiveness.  Ultimately, the result is cynicism about all research.

Part of the problem is that education is effectively a government monopoly, so entrepreneurship or responsible innovation are difficult to start or maintain.  However, the fact that education is a government monopoly can also be made into a positive, if government leaders are willing to encourage and support evidence-based reform.

Imagine that government decided to provide incentive funding to schools to help them adopt programs that meet a high standard of evidence.  This has actually happened under the ESSA law, but only in a very narrow slice of schools, those very low achieving schools that qualify for school improvement.  Imagine that the government provided a lot more support to schools to help them learn about, adopt, and effectively implement proven programs, and then gradually expanded the categories of schools that could qualify for this funding.

Going back to the farmer and the moon rocks, such a policy would forge a link between exciting research on promising innovations and the real world of practice.  It could cause educators to pay much closer attention to research on practical programs of relevance to them, and to learn how to tell the difference between valid and biased research.  It could help educators become sophisticated and knowledgeable consumers of evidence and of programs themselves.

One of the best examples of the transformation such policies could bring about is agriculture.  Research has a long history in agriculture, and from colonial times, government has encouraged and incentivized farmers to pay attention to evidence about new practices, new seeds, new breeds of animals, and so on.  By the late 19th century, the U.S. Department of Agriculture was sponsoring research, distributing information designed to help farmers be more productive, and much more.  Today, research in agriculture is a huge enterprise, constantly making important discoveries that improve productivity and reduce costs.  As a result, world agriculture, especially American agriculture, is able to support far larger populations at far lower costs than anyone ever thought possible.  The Iranian farmer talking about the moon rocks could not see how advances in science could possibly benefit him personally.  Today, however, in every developed economy, farmers have a clear understanding of the connection between advances in science and their own success.  Everyone knows that agriculture can have bad as well as good effects, as when new practices lead to pollution, but when governments decide to solve those problems, they turn to science. Science is not inherently good or bad, but if it is powerful, then democracies can direct it to do what is best for people.

Agriculture has made dramatic advances over the past hundred years, and continues to make rapid progress by linking science to practice.  In education, we are just starting to make the link between evidence and practice.  Isn’t it time to learn from the experiences of medicine, technology, and agriculture, among many other evidence based fields, to achieve more rapid progress in educational practice and outcomes?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.