Healing Covid-19’s Educational Losses: What is the Evidence?

I’ve written several blogs (here, here, here, here, here, and here) on what schools can do when they finally open permanently, to remedy what will surely be serious harm to the educational progress of millions of students. Without doubt, the students who are suffering the most from lengthy school closures are disadvantaged students, who are most likely to lack access to remote technology or regular support when their schools have been closed.

 Recently, there have been several articles circulated in the education press (e.g., Sawchuk, 2020) and newsletters laying out the options schools might consider to greatly improve the achievement of students who lost the most, and are performing far behind grade level.

The basic problem is that if schools simply start off with usual teaching for each grade level, this may be fine for students at or just below grade level, but for those who are far below level, this is likely to add catastrophe to catastrophe. Students who cannot read the material they are being taught, or who lack the prerequisite skills for their grade level, will experience failure and frustration. So the challenge is to provide students who are far behind with intensive, additional services likely to quickly accelerate their progress, so that they can then profit from ordinary, at-grade-level lessons.

In the publications I’ve seen, there have been several solutions frequently put forward. I thought this might be a good time to review the most common prescriptions in terms of their evidence basis in rigorous experimental or quasi-experimental research.

Extra Time

One proposal is to extend the school day or school year to provide additional time for instruction. This sounds logical; if the problem is time out of school, let’s add time in school.

The effects of extra time depend, of course, on what schools provide during that additional time. Simply providing more clock hours in which typical instruction is provided makes little difference. For example, in a large Florida study (Figlio, Holden, & Ozek, 2018), high-poverty schools were given a whole hour every day for a year, for additional reading instruction. This had a small impact on reading achievement (ES=+0.09) at a cost of about $800 per student, or $300,000-$400,000 per school. Also, in a review of research on secondary reading programs by Baye, Lake, Inns & Slavin (2019), my colleagues and I examined whether remedial programs were more effective if they were provided during additional time (one class period a day more than what the control group received for one or more years) or if they were provided during regular class time (the same amount of time the control group also received). The difference was essentially zero. The extra time did not matter. What did matter was what the schools provided (here and here).

After-School Programs

Some sources suggest providing after-school programs for students experiencing difficulties. A review of research on this topic by Kidron & Lindsay (2014) examined effects of after-school programs on student achievement in reading and mathematics. The effects were essentially zero. One problem is that students often did not attend regularly, or were poorly motivated when they did attend.

Summer School

As noted in a recent blog, positive effects of summer school were found only when intensive phonics instruction was provided in grades K or 1, but even in these cases, positive effects did not last to the following spring. Summer school is also very expensive.

Tutoring

By far the most effective approach for students struggling in reading or mathematics is tutoring (see blogs here, here, and here). Outcomes for one-to-one or one-to-small group tutoring average +0.20 to +0.30 in both reading and mathematics, and there are several particular programs that routinely report outcomes of +0.40 or more. Using teaching assistants with college degrees as tutors can make tutoring very cost-effective, especially in small-group programs.

Whole-School Reforms

There are a few whole-school reforms that can have substantial impacts on reading and mathematics achievement. A recent review of our elementary school reform model, Success for All (Cheung et al., 2020), found an average effect size of +0.24 for all students across 17 studies, and an average of +0.54 for low achievers.

A secondary reform model called BARR has reported positive reading and mathematics outcomes for ninth graders (T. Borman et al., 2017)

Conclusion

Clearly, something needs to be done about students returning to in-person education who are behind grade level in reading and/or mathematics. But resources devoted to helping these students need to be focused on approaches proven to work. This is not the time to invest in plausible but unproven programs. Students need the best we have that has been repeatedly shown to work.

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2019). Effective reading programs for secondary students. Reading Research Quarterly, 54 (2), 133-166.

Borman, T., Bos, H., O’Brien, B. C., Park, S. J., & Liu, F. (2017). i3 BARR validation study impact findings: Cohorts 1 and 2. Washington, DC: American Institutes for Research.

Cheung, A., Xie, C., Zhang, T., Neitzel, A., & Slavin, R. E. (2020). Success for All: A quantitative synthesis of evaluations. Manuscript submitted for publication. (Contact us for a copy.)

Figlio, D. N., Holden, K. L., & Ozek, U. (2018). Do students benefit from longer school days? Regression discontinuity evidence from Florida’s additional hour of literacy instruction. Economics of Education Review, 67, 171-183. https://doi.org/10.1016/j.econedurev.2018.06.003

Kidron, Y., & Lindsay, J. (2014). The effects of increased learning time on student academic and nonacademic outcomes: Findings from a meta‑analytic review (REL 2014-015). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Appalachia.

Sawchuk, S. (2020, August 26). Overcoming Covid-19 learning loss. Education Week, 40 (2), 6.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Is ES=+0.50 Achievable?: Schoolwide Approaches That Might Meet This Standard

In a recent blog, “Make No Small Plans,” I proposed a system innovators could use to create very effective schoolwide programs.  I defined these as programs capable of making a difference in student achievement large enough to bring entire schools serving disadvantaged students to the levels typical of middle class schools.  On average, that would mean creating school models that could routinely add an effect size of +0.50 for entire disadvantaged schools.  +0.50, or half a standard deviation, is roughly the average difference between students who qualify for free lunch and those who do not, between African American and White students, and between Hispanic and non-Hispanic White students.

Today, I wanted to give some examples of approaches intended to meet the +0.50 goal. From prior work, my colleagues and I already have created a successful schoolwide reform model, Success for All, which, with adequate numbers of tutors (as many as six per school) achieved reading effect sizes in high-poverty Baltimore elementary schools of over +0.50 for all students and +0.75 for the lowest-achieving quarter of students (Madden et al, 1993).   These outcomes maintained through eighth grade, and showed substantial reductions in grade retentions and special education placements (Borman & Hewes, 2003).  Steubenville, in Ohio’s Rust Belt, uses Success for All in all of its Title I elementary schools, providing several tutors in each.  Each year, Steubenville schools score among the highest in Ohio on state tests, exceeding most wealthy suburban schools.  Other SFA schools with sufficient tutors are also exemplary in achievement gains.  Yet these schools face a dilemma.  Most cannot afford significant numbers of tutors.  They still get excellent results, but less than those typical of SFA schools that do have sufficient tutors.

blog_12-20-18_tutornkid_500x333

We are now planning another approach, also intended to produce schoolwide effect sizes of at least +0.50 in schools serving disadvantaged students.   However, in this case our emphasis is on tutoring, the most effective strategy known for improving the achievement of struggling readers (Inns et al., 2019).  We are calling this approach the Reading Safety Net.  Main components of this plan are as follows:

Tutoring

Like the most successful forms of Success for All, the Reading Safety Net places a substantial emphasis on tutoring.  Tutors will be well-qualified teaching assistants with BAs but not teaching certificates, extensively trained to provide one-to-four tutoring.   Tutors will use a proven computer-assisted model in which students do a lot of pair teaching.  This is what we now call our Tutoring With the Lightning Squad model, which achieved outcomes of +0.40 and +0.46 in two studies in the Baltimore City Public Schools (Madden & Slavin, 2017).  A high-poverty school of 500 students might engage about five tutors, providing extensive tutoring to the majority of students, for as many years as necessary.  One additional tutor or teacher will supervise the tutors and personally work with students having the most serious problems.   We will provide significant training and follow-up coaching to ensure that all tutors are effective.

blog_11-8-18_tutoring_500x333

Attendance and Health

Many students fail in reading or other outcomes because they have attendance problems or certain common health problems. We propose to provide a health aide to help solve these problems.

Attendance

Many students, especially those in high-poverty schools, fail because they do not attend school regularly. Yet there are several proven approaches for increasing attendance, and reducing chronic truancy (Shi, Inns, Lake, and Slavin, 2019).  Health aides will help teachers and other staff organize and manage effective attendance improvement approaches.

Vision Services

My colleagues and I have designed strategies to help ensure that all students who need eyeglasses receive them. A key problem in this work is ensuring that students who receive glasses use them, keep them safe, and replace them if they are lost or broken. Health aides will coordinate use of proven strategies to increase regular use of needed eyeglasses.

blog_4-19-18_tutoring_500x329

Asthma and other health problems

Many students in high-poverty schools suffer from chronic illnesses.  Cures or prevention are known for these, but the cures may not work if medications are not taken daily.   For example, asthma is common in high-poverty schools, where it is the top cause of hospital referrals and a leading cause of death for school-age children.  Inexpensive inhalers can substantially improve children’s health, yet many children do not regularly take their medicine. Studies suggest that having trained staff ensure that students take their medicine, and watch them doing so, can make a meaningful difference.  The same may be true of other chronic, easily treated diseases common among children but often not consistently treated in inner-city schools.  Health aides with special supplemental training may be able to play a key on-the-ground role in helping ensure effective treatment for asthma and other diseases.

Potential Impact

The Reading Safety Net is only a concept at present.  We are seeking funding to support its further development and evaluation.  As we work with front line educators, colleagues, and others to further develop this model, we are sure to find ways to make the approach more effective and cost-effective, and perhaps extend it to solve other key problems.

We cannot yet claim that the Reading Safety Net has been proven effective, although many of its components have been.  But we intend to do a series of pilots and component evaluations to progressively increase the impact, until that impact attains or surpasses the goal of ES=+0.50.  We hope that many other research teams will mobilize and obtain resources to find their own ways to +0.50.  A wide variety of approaches, each of which would be proven to meet this ambitious goal, would provide a range of effective choices for educational leaders and policy makers.  Each would be a powerful, replicable tool, capable of solving the core problems of education.

We know that with sufficient investment and encouragement from funders, this goal is attainable.  If it is in fact attainable, how could we accept anything less?

References

Borman, G., & Hewes, G. (2003).  Long-term effects and cost effectiveness of Success for All.  Educational Evaluation and Policy Analysis, 24 (2), 243-266.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Manuscript submitted for publication.

Madden, N. A., & Slavin, R. E. (2017). Evaluations of Technology-Assisted Small-Group Tutoring for Struggling Readers. Reading & Writing Quarterly, 1-8.

Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L., & Wasik, B. (1993). Success for All:  Longitudinal effects of a schoolwide elementary restructuring program. American Educational Reseach Journal, 30, 123-148.

Shi, C., Inns, A., Lake, C., & Slavin, R. E. (2019). Effective school-based programs for K-12 students’ attendance: A best-evidence synthesis. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

What’s the Evidence that Evidence Works?

I recently gave a couple of speeches on evidence-based reform in education in Barcelona.  In preparing for them, one of the organizers asked me an interesting question: “What is your evidence that evidence works?”

At one level, this is a trivial question. If schools select proven programs and practices aligned with their needs and implement them with fidelity and intelligence, with levels of resources similar to those used in the original successful research, then of course they’ll work, right? And if a school district adopts proven programs, encourages and funds them, and monitors their implementation and outcomes, then of course the appropriate use of all these programs is sure to enhance achievement district-wide, right?

Although logic suggests that a policy of encouraging and funding proven programs is sure to increase achievement on a broad scale, I like to be held to a higher standard: Evidence. And, it so happens, I happen to have some evidence on this very topic. This evidence came from a large-scale evaluation of an ambitious, national effort to increase use of proven and promising schoolwide programs in elementary and middle schools, in a research center funded by the Institute for Education Sciences (IES) called the Center for Data-Driven Reform in Education, or CDDRE (see Slavin, Cheung, Holmes, Madden, & Chamberlain, 2013). The name of the program the experimental schools used was Raising the Bar.

How Raising the Bar Raised the Bar

The idea behind Raising the Bar was to help schools analyze their own needs and strengths, and then select whole-school reform models likely to help them meet their achievement goals. CDDRE consultants provided about 30 days of on-site professional development to each district over a 2-year period. The PD focused on review of data, effective use of benchmark assessments, school walk-throughs by district leaders to see the degree to which schools were already using the programs they claimed to be using, and then exposing district and school leaders to information and data on schoolwide programs available to them, from several providers. If districts selected a program to implement, their district and school received PD on ensuring effective implementation and principals and teachers received PD on the programs they chose.

blog_7-26-18_polevault_375x500

Evaluating Raising the Bar

In the study of Raising the Bar we recruited a total of 397 elementary and 225 middle schools in 59 districts in 7 states (AL, AZ, IN, MS, OH, TN). All schools were Title I schools in rural and mid-sized urban districts. Overall, 30% of students were African-American, 20% were Hispanic, and 47% were White. Across three cohorts, starting in 2005, 2006, or 2007, schools were randomly assigned to either use Raising the Bar, or to continue with what they were doing. The study ended in 2009, so schools could have been in the Raising the Bar group for two, three, or four years.

Did We Raise the Bar?

State test scores were obtained from all schools and transformed to z-scores so they could be combined across states. The analyses focused on grades 5 and 8, as these were the only grades tested in some states at the time. Hierarchical linear modeling, with schools nested within districts, were used for analysis.

For reading in fifth grade, outcomes were very good. By Year 3, the effect sizes were significant, with significant individual-level effect sizes of +0.10 in Year 3 and +0.19 in Year 4. In middle school reading, effect sizes reached an effect size of +0.10 by Year 4.

Effects were also very good in fifth grade math, with significant effects of +0.10 in Year 3 and +0.13 in Year 4. Effect sizes in middle school math were also significant in Year 4 (ES=+0.12).

Note that these effects are for all schools, whether they adopted a program or not. Non-experimental analyses found that by Year 4, elementary schools that had chosen and implemented a reading program (33% of schools by Year 3, 42% by Year 4) scored better than matched controls in reading. Schools that chose any reading program usually chose our Success for All reading program, but some chose other models. Even in schools that did not adopt reading or math programs, scores were always higher, on average, (though not always significantly higher) than for schools that did not choose programs.

How Much Did We Raise the Bar?

The CDDRE project was exceptional because of its size and scope. The 622 schools, in 59 districts in 7 states, were collectively equivalent to a medium-sized state. So if anyone asks what evidence-based reform could do to help an entire state, this study provides one estimate. The student-level outcome in elementary reading, an effect size of +0.19, applied to NAEP scores, would be enough to move 43 states to the scores now only attained by the top 10. If applied successfully to schools serving mostly African American and Hispanic students or to students receiving free- or reduced-price lunches regardless of ethnicity, it would reduce the achievement gap between these and White or middle-class students by about 38%. All in four years, at very modest cost.

Actually, implementing something like Raising the Bar could be done much more easily and effectively today than it could in 2005-2009. First, there are a lot more proven programs to choose from than there were then. Second, the U.S. Congress, in the Every Student Succeeds Act (ESSA), now has definitions of strong, moderate, and promising levels of evidence, and restricts school improvement grants to schools that choose such programs. The reason only 42% of Raising the Bar schools selected a program is that they had to pay for it, and many could not afford to do so. Today, there are resources to help with this.

The evidence is both logical and clear: Evidence works.

Reference

Slavin, R. E., Cheung, A., Holmes, G., Madden, N. A., & Chamberlain, A. (2013). Effects of a data-driven district reform model on state assessment outcomes. American Educational Research Journal, 50 (2), 371-396.

Photo by Sebastian Mary/Gio JL [CC BY-SA 2.0  (https://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

New Findings on Tutoring: Four Shockers

blog_04 05 18_SURPRISE_500x353One-to-one and one-to-small group tutoring have long existed as remedial approaches for students who are performing far below expectations. Everyone knows that tutoring works, and nothing in this blog contradicts this. Although different approaches have their champions, the general consensus is that tutoring is very effective, and the problem with widespread use is primarily cost (and for tutoring by teachers, availability of sufficient teachers). If resources were unlimited, one-to-one tutoring would be the first thing most educators would recommend, and they would not be wrong. But resources are never unlimited, and the numbers of students performing far below grade level are overwhelming, so cost-effectiveness is a serious concern. Further, tutoring seems so obviously effective that we may not really understand what makes it work.

In recent reviews, my colleagues and I examined what is known about tutoring. Beyond the simple conclusion that “tutoring works,” we found some big surprises, four “shockers.” Prepare to be amazed! Further, I propose an explanation to account for these unexpected findings.

We have recently released three reviews that include thorough, up-to-date reviews of research on tutoring. One is a review of research on programs for struggling readers in elementary schools by Amanda Inns and colleagues (2018). Another is a review on programs for secondary readers by Ariane Baye and her colleagues (2017). Finally, there is a review on elementary math programs by Marta Pellegrini et al. (2018). All three use essentially identical methods, from the Best Evidence Encyclopedia (www.bestevidence.org). In addition to sections on tutoring strategies, all three also include other, non-tutoring methods directed at the same populations and outcomes.

What we found challenges much of what everyone thought they knew about tutoring.

Shocker #1: In all three reviews, tutoring by paraprofessionals (teaching assistants) was at least as effective as tutoring by teachers. This was found for reading and math, and for one-to-one and one-to-small group tutoring.  For struggling elementary readers, para tutors actually had higher effect sizes than teacher tutors. Effect sizes were +0.53 for paras and +0.36 for teachers in one-to-one tutoring. For one-to-small group, effect sizes were +0.27 for paras, +0.09 for teachers.

Shocker #2: Volunteer tutoring was far less effective than tutoring by either paras or teachers. Some programs using volunteer tutors provided them with structured materials and extensive training and supervision. These found positive impacts, but far less than those for paraprofessional tutors. Volunteers tutoring one-to-one had an effect size of +0.18, paras had an effect size of +0.53. Because of the need for recruiting, training, supervision, and management, and also because the more effective tutoring models provide stipends or other pay, volunteers were not much less expensive than paraprofessionals as tutors.

Shocker #3:  Inexpensive substitutes for tutoring have not worked. Everyone knows that one-to-one tutoring works, so there has long been a quest for approaches that simulate what makes tutoring work. Yet so far, no one, as far as I know, has found a way to turn lead into tutoring gold. Although tutoring in math was about as effective as tutoring in reading, a program that used online math tutors communicating over the Internet from India and Sri Lanka to tutor students in England, for example, had no effect. Technology has long been touted as a means of simulating tutoring, yet even when computer-assisted instruction programs have been effective, their effect sizes have been far below those of the least expensive tutoring models, one-to-small group tutoring by paraprofessionals. In fact, in the Inns et al. (2018) review, no digital reading program was found to be effective with struggling readers in elementary schools.

 Shocker #4: Certain whole-class and whole-school approaches work as well or better for struggling readers than tutoring, on average. In the Inns et al. (2018) review, the average effect size for one-to-one tutoring approaches was +0.31, and for one-to-small group approaches it was +0.14. Yet the mean for whole-class approaches, such as Ladders to Literacy (ES = +0.48), PALS (ES = +0.65), and Cooperative Integrated Reading and Composition (ES = +0.19) averaged +0.33, similar to one-to-one tutoring by teachers (ES = +0.36). The mean effect sizes for comprehensive tiered school approaches, such as Success for All (ES = +0.41) and Enhanced Core Reading Instruction (ES = +0.22) was +0.43, higher than any category of tutoring (note that these models include tutoring as part of an integrated response to implementation approach). Whole-class and whole-school approaches work with many more students than do tutoring models, so these impacts are obtained at a much lower cost per pupil.

Why does tutoring work?

Most researchers and others would say that well-structured tutoring models work primarily because they allow tutors to fully individualize instruction to the needs of students. Yet if this were the only explanation, then other individualized approaches, such as computer-assisted instruction, would have outcomes similar to those of tutoring. Why is this not the case? And why do paraprofessionals produce at least equal outcomes to those produced by teachers as tutors? None of this squares with the idea that the impact of tutoring is entirely due to the tutor’s ability to recognize and respond to students’ unique needs. If that were so, other forms of individualization would be a lot more effective, and teachers would presumably be a lot more effective at diagnosing and responding to students’ problems than would less highly trained paraprofessionals. Further, whole-class and whole-school reading approaches, which are not completely individualized, would have much lower effect sizes than tutoring.

My theory to account for the positive effects of tutoring in light of the four “shockers” is this:

  • Tutoring does not work due to individualization alone. It works due to individualization plus nurturing and attention.

This theory begins with the fundamental and obvious assumption that children, perhaps especially low achievers, are highly motivated by nurturing and attention, perhaps far more than by academic success. They are eager to please adults who relate to them personally.  The tutoring setting, whether one-to-one or one-to-very small group, gives students the undivided attention of a valued adult who can give them personal nurturing and attention to a degree that a teacher with 20-30 students cannot. Struggling readers may be particularly eager to please a valued adult, because they crave recognition for success in a skill that has previously eluded them.

Nurturing and attention may explain the otherwise puzzling equality of outcomes obtained by teachers and paraprofessionals as tutors. Both types of tutors, using structured materials, may be equally able to individualize instruction, and there is no reason to believe that paras will be any less nurturing or attentive. The assumption that teachers would be more effective as tutors depends on the belief that tutoring is complicated and requires the extensive education a teacher receives. This may be true for very unusual learners, but for most struggling students, a paraprofessional may be as capable as a teacher in providing individualization, nurturing, and attention. This is not to suggest that paraprofessionals are as capable as teachers in every way. Teachers have to be good at many things: preparing and delivering lessons, managing and motivating classes, and much more. However, in their roles as tutors, teachers and paraprofessionals may be more similar.

Volunteers certainly can be nurturing and attentive, and can be readily trained in structured programs to individualize instruction. The problem, however, is that studies of volunteer programs report difficulties in getting volunteers to attend every day and to avoid dropping out when they get a paying job. This is may be less of a problem when volunteers receive a stipend; paid volunteers are much more effective than unpaid ones.

The failure of tutoring substitutes, such as individualized technology, is easy to predict if the importance of nurturing and attention is taken into account. Technology may be fun, and may be individualized, but it usually separates students from the personal attention of caring adults.

Whole-Class and Whole-School Approaches.

Perhaps the biggest shocker of all is the finding that for struggling readers, certain non-technology approaches to instruction for whole classes and schools can be as effective as tutoring. Whole-class and whole-school approaches can serve many more students at much lower cost, of course. These classroom approaches mostly use cooperative learning and phonics-focused teaching, or both, and the whole-school models especially Success for All,  combine these approaches with tutoring for students who need it.

The success of certain whole-class programs, of certain tutoring approaches, and of whole-school approaches that combine proven teaching strategies with tutoring for students who need more, argues for response to intervention (RTI), the policy that has been promoted by the federal government since the 1990s. So what’s new? What’s new is that the approach I’m advocating is not just RTI. It’s RTI done right, where each component of  the strategy has strong evidence of effectiveness.

The good news is that we have powerful and cost-effective tools at our disposal that we could be putting to use on a much more systematic scale. Yet we rarely do this, and as a result far too many students continue to struggle with reading, even ending up in special education due to problems schools could have prevented. That is the real shocker. It’s up to our whole profession to use what works, until reading failure becomes a distant memory. There are many problems in education that we don’t know how to solve, but reading failure in elementary school isn’t one of them.

Practical Implications.

Perhaps the most important practical implication of this discussion is a realization that benefits similar or greater than those of one-to-one tutoring by teachers can be obtained in other ways that can be cost-effectively extended to many more students: Using paraprofessional tutors, using one-to-small group tutoring, or using whole-class and whole-school tiered strategies. It is no longer possible to say with a shrug, “of course tutoring works, but we can’t afford it.” The “four shockers” tell us we can do better, without breaking the bank.

 

References

Baye, A., Lake, C., Inns, A., & Slavin, R. (2017). Effective reading programs for secondary students. Manuscript submitted for publication. Also see Baye, A., Lake, C., Inns, A. & Slavin, R. E. (2017, August). Effective Reading Programs for Secondary Students. Baltimore, MD: Johns Hopkins University, Center for Research and Reform in Education.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2018). Effective programs for struggling readers: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

Pellegrini, M., Inns, A., & Slavin, R. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, DC.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Photo by Westsara (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

 

Where Will the Capacity for School-by-School Reform Come From?

In recent months, I’ve had a number of conversations with state and district leaders about implementing the ESSA evidence standards. To its credit, ESSA diminishes federal micromanaging, and gives more autonomy to states and locals, but now that the states and locals are in charge, how are they going to achieve greater success? One state department leader described his situation in ESSA as being like that of a dog who’s been chasing cars for years, and then finally catches one. Now what?

ESSA encourages states and local districts to help schools adopt and effectively implement proven programs. For school improvement, portions of Title II, and Striving Readers, ESSA requires use of proven programs. Initially, state and district folks were worried about how to identify proven programs, though things are progressing on that front (see, for example, www.evidenceforessa.org). But now I’m hearing a lot more concern about capacity to help all those individual schools do needs assessments, select proven programs aligned with their needs, and implement them with thought, care, and knowledgeable application of implementation science.

I’ve been in several meetings where state and local folks ask federal folks how they are supposed to implement ESSA. “Regional educational labs will help you!” they suggest. With all due respect to my friends in the RELs, this is going to be a heavy lift. There are ten of them, in a country with about 52,000 Title I schoolwide projects. So each REL is responsible for, on average, five states, 1,400 districts, and 5,200 high-poverty schools. For this reason, RELs have long been primarily expected to work with state departments. There are just not enough of them to serve many individual districts, much less schools.

State departments of education and districts can help schools select and implement proven programs. For example, they can disseminate information on proven programs, make sure that recommended programs have adequate capacity, and perhaps hold effective methods “fairs” to introduce people in their state to program providers. But states and districts rarely have capacity to implement proven programs themselves. It’s very hard to build state and local capacity to support specific proven programs. For example, due to frequent downturns in state or district funding come, the first departments to be cut back or eliminated often involve professional development. For this reason, few state departments or districts have large, experienced professional development staffs. Further, constant changes in state and local superintendents, boards, and funding levels, make it difficult to build up professional development capacity over a period of years.

Because of these problems, schools have often been left to make up their own approaches to school reform. This happened on a wide scale in the NCLB School Improvement Grants (SIG) program, where federal mandates specified very specific structural changes but left the essentials, teaching, curriculum, and professional development, up to the locals. The MDRC evaluation of SIG schools found that they made no better gains than similar, non-SIG schools.

Yet there is substantial underutilized capacity available to help schools across the U.S. to adopt proven programs. This capacity resides in the many organizations (both non-profit and for-profit) that originally created the proven programs, provided the professional development that caused them to meet the “proven” standard, and likely built infrastructure to ensure quality, sustainability, and growth potential.

The organizations that created proven programs have obvious advantages (their programs are known to work), but they also have several less obvious advantages. One is that organizations built to support a specific program have a dedicated focus on that program. They build expertise on every aspect of the program. As they grow, they hire capable coaches, usually ones who have already shown their skills in implementing or leading the program at the building level. Unlike states and districts that often live in constant turmoil, reform organizations or for-profit professional development organizations are likely to have stable leadership over time. In fact, for a high-poverty school engaged with a program provider, that provider and its leadership may be the only partner stable enough to be likely to be able to help them with their core teaching for many years.

State and district leaders play major roles in accountability, management, quality assurance, and personnel, among many other issues. With respect to implementation of proven programs, they have to set up conditions in which schools can make informed choices, monitor the performance of provider organizations, evaluate outcomes, and ensure that schools have the resources and supports they need. But truly reforming hundreds of schools in need of proven programs one at a time is not realistic for most states and districts, at least not without help. It makes a lot more sense to seek capacity in organizations designed to provide targeted professional development services on proven programs, and then coordinate with these providers to ensure benefits for students.

This blog is sponsored by the Laura and John Arnold Foundation

Evidence and Freedom

One of the strangest arguments I hear against evidence-based reform in education is that encouraging or incentivizing schools to use programs or practices proven to work in rigorous experiments will reduce the freedom of schools to do what they think is best for their students.

Freedom? Really?

To start with, consider how much freedom schools have now. Many districts and state departments of education have elaborate 100% evidence-free processes of restricting the freedom of schools. They establish lists of approved providers of textbooks, software, and professional development, based perhaps on state curriculum standards but also on current trends, fads, political factors, and preferences of panels of educators and other citizens. Many states have textbook adoption standards that consider paper weight, attractiveness, politically correct language, and other surface factors, but never evidence of effectiveness. Federal policies specify how teachers should be evaluated, how federal dollars should be utilized, and how students should be assessed. I could go on for more pages than anyone wants to read with examples of how teachers’ and principals’ choices are constrained by district, state, and federal policies, very few of which have ever been tested in comparison to control groups. Why do schools use this textbook or that software or the other technology? Because their district or state bought it for them, trained them in its use (perhaps), and gave them no alternative.

The evidence revolution offers the possibility of freedom, if the evidence now becoming widely available is used properly. The minimum principle of evidence-based reform should be this: “If it is proven to work, you are allowed to use it.”

At bare minimum, evidence of effectiveness should work as a “get out of jail free” card to counter whatever rules, restrictions, or lists of approved materials schools have been required to follow.

But permission is not enough, because mandated, evidence-free materials, software, and professional development may eat up the resources needed to implement proven programs. So here is a slightly more radical proposition: “Whenever possible, school staffs should have the right, by majority vote of the staff, to adopt proven programs to replace current programs mandated by the district or state.”

For example, when a district or state requires use of anything, it could make the equivalent in money available to schools to use to select and implement programs proven to be effective in producing the desired outcome. If the district adopts a new algebra text or elementary science curriculum, for instance, it could allow schools to select an alternative with good evidence of effectiveness for algebra or elementary science, as long as the school agrees to implement the program with fidelity and care, achieving levels of implementation like those in the research that validated the program.

The next level of freedom to choose what works would be to provide incentives and support for schools that select proven programs and promise to implement them with fidelity.

“Schools should be able to apply for federal, state, or local funds to implement proven programs of their choice. Alternatively, they may receive competitive preference points on grants if they promise to adopt and effectively implement proven programs.”

This principle exists today in the Every Student Succeeds Act (ESSA), where schools applying for school improvement funding must select programs that meet one of three levels of evidence: strong (at least one randomized experiment with positive outcomes), moderate (at least one quasi-experimental [matched] study with positive outcomes), or promising (at least one correlational study with positive outcomes). In seven other programs in ESSA, schools applying for federal funds receive extra competitive preference points on their applications if they commit to using programs that meet one of those three levels of evidence. The principle in ESSA – that use of proven programs should be encouraged – should be expanded to all parts of government where proven programs exist.

One problem with these principles is that they depend on having many proven programs in each area from which schools can choose. At least in reading and math, grades K-12, this has been accomplished; our Evidence for ESSA website describes approximately 100 programs that meet the top three ESSA evidence standards. More than half of these meet the “strong” standard.

However, we must have a constant flow of new approaches in all subjects and grade levels. Evidence-based policy requires continuing investments in development, evaluation, and dissemination of proven programs. The Institute of Education Sciences (IES), the Investing in Innovation (i3) program, and now the Education Innovation and Research (EIR) grant program, help fulfill this function, and they need to continue to be supported in their crucial work.

So is this what freedom looks like in educational innovation? I would argue that it does. Note that what I did not say is that programs lacking evidence should be forbidden. Mandating use of programs, no matter how well evaluated, is a path to poor implementation and political opposition. Instead, schools should have the opportunity and the funding to adopt proven programs. If they prefer not to do so, that is their choice. But my hope and expectation is that in a political system that encourages and supports use of proven programs, educators will turn out in droves to use better programs, and the schools that might have been reluctant at first will see and emulate the success their neighbors are having.

Freedom to use proven programs should help districts, states, and the federal government have confidence that they can at long last stop trying to micromanage schools. If policymakers know that schools are making good choices and getting good results, why should they want to get in their way?

Freedom to use whatever is proven to enhance student learning. Doesn’t that have a nice ring to it? Like the Liberty Bell?

This blog is sponsored by the Laura and John Arnold Foundation

You Can Step Twice in the Same River: Systems in Education

You can never step twice in the same river.  At least that is what Greek philosopher Heraclitus said a long time ago, when Socrates was just a pup.  What he meant, of course, was that a river is constantly changing, for reasons large and small, so the river you waded across yesterday, or even a minute ago, is not the same one you wade in now.

This proposition is both obvious and wrong.  Sure, rivers are never 100% the same.  But does it matter?  Imagine, for example, that you somehow drained all the water out of a river.  Within a few days or weeks, it would entirely revive itself.  The reason is that a river is not a “thing.”  It is a system.  In other words, a river exists because there is a certain level of rainfall or groundwater or water from upstream, and then a certain topography (rivers are in low-lying areas, compared to surrounding land).  Those factors create the river, and as long as they exist, the river exists.  So when you wade into a river, you are wading into a system, and (sorry, Heraclitus) it is always the same system, because even if the river is higher or lower or muddier or clearer than usual, the system is always the same, unless something pretty dramatic happens upstream.

So why am I rattling on about rivers?  The point I hope to make is that genuine and lasting change in a school depends on changing the system in which the school operates, not just small parts of the school that will be swept away if the system stays unchanged.

Here’s what I mean from an education reform perspective.  Teachers’ daily practices in classrooms are substantially determined by powerful systems.  Whatever innovations you introduce in a school, no matter how effective in the short term, will be eliminated and forgotten if the rest of the system does not change.  For example, if a school implements a great new math program but does not solve classroom management or attendance problems, the school may not maintain its math reform.  Lasting change in math, for example, might require attending to diversity in achievement levels by providing effective tutoring or small-group assistance.  It might require providing eyeglasses to children who need them.  It might require improving reading performance as well as math.  It might require involving parents.  It might require constant monitoring of students’ math performance and targeted responses to solve problems.  It might require recruiting volunteers, or making good use of after school or summer time.  It might require mobilizing department heads or other math leaders within the school to support implementation, and to help maintain the effective program when (predictable) turmoil threatens it.  Policy changes at the district, state, and national levels may also help, but I’m just focusing for the moment on aspects of the system that an individual school or district can implement on its own.  Attending to all of these factors at once may increase the chances that in five or ten years, the effective program remains in place and stays effective, even if the original principal, department head, teachers, and special funds are no longer at the school.

It’s not that every school has to do all of these things to improve math performance over time, but I would argue that lasting impact will depend on some constellation of supports that change the system in which the math reform operates.  Otherwise, the longstanding system of the school will return, washing away the reform and taking the school back to its pre-reform behaviors and policies.

A problem in all of this is that educational development and research often work against systemic change.  In particular, academic researchers are rewarded for publishing articles, and it helps if they evaluate approaches that purely represent a given theory.  Pragmatically, an approach with many components may be more expensive and more difficult to put in place.  As a result, a lot of proven programs available to educators are narrow, focused on the main objective but not on the broader system of the school.  This may be fine in the short run, but in the long run the narrowly focused treatment may not maintain over time.

Seen as a system, a river will never change its course until the key elements that determine its course themselves change.  Unless that happens, we’ll always be stepping into the same river, over and over again, and getting the same results.

Transforming Transformation (and Turning Around Turnaround)

At the very end of the Obama Administration, the Institute for Education Sciences (IES) released the final report of an evaluation of the outcomes of the federal School Improvement Grant program. School Improvement Grants (SIG) are major investments to help schools with the lowest academic achievement in their states to greatly improve their outcomes.

The report, funded by the independent and respected IES and carried out by the equally independent and respected Mathematica Policy Associates, found that SIG grants made essentially no difference in the achievement of the students in schools that received them.

Bummer.

In Baltimore, where I live, we believe that if you spend $7 billion on something, as SIG has so far, you ought to have something to show for it. The disappointing findings of the Mathematica evaluation are bad news for all of the usual reasons. Even if there were some benefits, SIG turned out to be a less-than-compelling use of taxpayers’ funds.  The students and schools that received it really needed major improvement, but improved very little. The findings undermine faith in the ability of very low-achieving schools to turn themselves around.

However, the SIG findings are especially frustrating because they could have been predicted, were in fact predicted by many, and were apparent long before this latest report. There is no question that SIG funds could have made a substantial difference. Had they been invested in proven programs and practices, they would have surely improved student outcomes just as they did in the research that established the effectiveness of the proven programs.

But instead of focusing on programs proven to work, SIG forced schools to choose among four models that had never been tried before and were very unlikely to work.

Three of the four models were so draconian that few schools chose them. One involved closing the school, and another, conversion to a charter school. These models were rarely selected unless schools were on the way to doing these things anyway. Somewhat more popular was “turnaround,” which primarily involved replacing the principal and 50% of the staff. The least restrictive model, “transformation,” involved replacing the principal, using achievement growth to evaluate teachers, using data to inform instruction, and lengthening the school day or year.

The problem is that very low achieving schools are usually in low achieving areas, where there are not long lines of talented applicants for jobs as principals or teachers. A lot of school districts just swapped principals between SIG and non-SIG schools. None of the mandated strategies had a strong research base, and they still don’t. Low achieving schools usually have limited capacity to reform themselves under the best of circumstances, and SIG funding required replacing principals, good or bad, thereby introducing instability in already tumultuous places. Further, all four of the SIG models had a punitive tone, implying that the problem was bad principals and teachers. Who wants to work in a school that is being punished?

What else could SIG have done?

SIG could have provided funding to enable low-performing schools and their districts to select among proven programs. This would have maintained an element of choice while ensuring that whatever programs schools chose would have been proven effective, used successfully in other low-achieving schools, and supported by capable intermediaries willing and able to work effectively in struggling schools.

Ironically, SIG did finally introduce such an option, but it was too little, too late.  In 2015, SIG introduced two additional models, one of which was an Evidence-Based, Whole-School Reform model that would allow schools to utilize SIG funds to adopt a proven whole-school approach. The U.S. Department of Education carefully reviewed the evidence and identified four approaches with strong evidence and the ability to expand that could be utilized under this model. But hardly any schools chose to utilize these approaches because there was little promotion of the new models, and few school, district, or state leaders to this day even know they exist.

The old SIG program is changing under the Every Student Succeeds Act (ESSA). In order to receive school improvement funding under ESSA, schools will have to select from programs that meet the strong, moderate, or promising evidence requirements defined in ESSA. Evidence for ESSA, the free web site we are due to release later this month, will identify more than 90 reading and math programs that meet these requirements.

This is a new opportunity for federal, state, and district officials to promote the use of proven programs and build local capacity to disseminate proven approaches. Instead of being seen as a trip to the woodshed, school improvement funding might be seen as an opportunity for eager teachers and administrators to do cutting edge instruction. Schools using these innovative approaches might become more exciting and fulfilling places to work, attracting and retaining the best teachers and administrators, whose efforts will be reflected in their students’ success.

Perhaps this time around, school improvement will actually improve schools.

Perfect Implementation of Hopeless Methods: The Sinking of the Vasa

If you are ever in Stockholm, you must visit the Vasa Museum. It contains a complete warship launched in 1628 that sank 30 minutes later. Other than the ship itself, the museum contains objects and bones found in the wreck, and carefully analyzed by scientists.

The basic story of the sinking of the Vasa has important analogies to what often happens in education reform.

After the Vasa sank, the king, who commissioned it, Gustav II Adolphe, called together a commission to find out whose fault it was, to punish the guilty.

Yet the commission, after many interviews with survivors, found that no one did anything wrong. 3 ½ centuries later, modern researchers came to the same conclusion. Everything was in order. The skeleton of the helmsman was found still gripping the steering pole, trying heroically to turn the ship’s bow into the wind to keep it from leaning over.

So what went wrong? The ship could never have sailed. It was built too top-heavy, with too much heavy wood and too many heavy guns on the top decks and too little ballast on the bottom. The Vasa was doomed, no matter what the captain and crew did.

In education reform, there is a constant debate about how much is contributed to effectiveness by a program as opposed to quality of implementation. In implementation science, there are occasionally claims that it does not matter what programs schools adopt, as long as they implement them well. But most researchers, developers, and educators agree that success only results from a combination of good programs and good implementation. Think of the relationship as multiplicative:

P X I = A

(Quality of program times quality of implementation equals achievement gain).

The reason the relationship might be multiplicative is that if either P or I is zero, achievement gain is zero. If both are very positive, then achievement gain is very, very positive.

In the case of the Vasa, P=0, so no matter how good implementation was, the Vasa was doomed. In many educational programs, the same is true. For example, programs that are not well worked out, not well integrated into teachers’ schedules and skill sets, or are too difficult to implement, are unlikely to work. One might argue that in order to have positive effects, a program must be very clear about what teachers are expected to do, so that professional development and coaching can be efficiently targeted to helping teachers do those things. Then we have to have evidence that links teachers’ doing certain things to improving student learning. For example, providing teachers with professional development to enhance their content knowledge may not be helpful if teachers are not clear how to put this new knowledge into their daily teaching.

Rigorous research, especially under funding from IES and i3 in the U.S. and from EEF in England, is increasingly identifying proven programs as well as programs that consistently fail to improve student outcomes. The patterns are not perfectly clear, but in general those programs that do make a significant difference are ones that are well-designed, practical, and coherent.

If you think implementation alone will carry the day, keep in mind the skeleton of the heroic helmsman of the Vasa, spending 333 years on the seafloor trying to push the Vasa’s bow into the wind. He did everything right, except for signing on to the wrong ship.

What Schools in One Place Can Learn from Schools Elsewhere

In a recent blog, I responded to an article by Lisbeth Schorr and Srik Gopal about their concerns that the findings of randomized experiments will not generalize from one set of schools to another. I got a lot of supportive response to the blog, but I realize that I left out a key point.

The missing point was this: the idea that effective programs readily generalize from one place to another is not theoretical. It happens all the time. I try to avoid talking about our own programs, but in this case, it’s unavoidable. Our Success for All program started almost 30 years ago, working with African American students in Baltimore. We got terrific results with those first schools. But our first dissemination schools beyond Baltimore included a Philadelphia school primarily serving Cambodian immigrants, rural schools in the South, small town schools in the Midwest, and so on. We had to adapt and refine our approaches for these different circumstances, but we found positive effects across a very wide range of settings and circumstances. Over the years, some of our most successful schools have been ones serving a Native Americans, such as a school in the Arizona desert and a school in far northern Quebec. Another category of schools where we see outstanding success is ones serving Hispanic students, including English language learners, as in the Alhambra district in Phoenix and a charter school near Los Angeles. One of our most successful districts anywhere is in small-city Steubenville, Ohio. We have established a successful network of SFA schools in England and Wales, where we have extraordinary schools primarily serving Pakistani, African, and disadvantaged White students in a very different policy context from the one we face in the U.S. And yes, we continue to find great results in Baltimore and in cities that resemble our original home, such as Detroit.

The ability to generalize from one set of schools to others is not at all limited to Success for All. Reading Recovery, for example, has had success in every kind of school, in countries throughout the world. Direct Instruction has also been successful in a wide array of types of schools. In fact, I’d argue that it is rare to find programs that have been proven to be effective in rigorous research that then fail to generalize to other schools, even ones that are quite different. Of course, there is great variation in outcomes in any set of schools using any innovative program, but that variation has to do with leadership, local support, resources, and so on, not with a fundamental limitation on generalizability to additional populations.

How is it possible that programs initially designed for one setting and population so often generalize to others? My answer would be that in most fundamental regards, the closer you get to the classroom, the more schools begin to resemble each other. Individual students do not all learn the same way, but every classroom contains a range of students who have a predictable set of needs. Any effective program has to be able to meet those needs, wherever the school happens to be located. For example, every classroom has some number of kids who are confident, curious, and capable, some number who are struggling, some number who are shy and quiet, some number who are troublemakers. Most contain students who are not native speakers of English. Any effective program has to have a workable plan for each of these types of students, even if the proportions of each may vary from classroom to classroom and school to school.

There are reasonable adaptations necessary for different school contexts, of course. There are schools where attendance is a big issue and others where it can be assumed, schools where safety is a major concern and others where it is less so. Schools in rural areas have different needs from those in urban or suburban ones, and obviously schools with many recent immigrants have different needs from those in which all students are native speakers of English. Involving parents effectively looks different in different places, and there are schools in which eyeglasses and other health concerns can be assumed to be taken care of and others where they are major impediments to success. But after the necessary accommodations are made, you come down to a teacher and twenty to thirty children who need to be motivated, to be guided, to have their individual needs met, and to have their time used to greatest effect. You need to have an effective plan to manage diverse needs and to inspire kids to see their own possibilities. You need to fire children’s imaginations and help them use their minds well to write and solve problems and imagine their own futures. These needs exist equally in Peru and Poughkeepsie, in the Arizona desert or the valleys of Wales, in Detroit or Eastern Kentucky, in California or Maine.

Disregarding evidence from randomized experiments because it does not always replicate is a recipe for the status quo, as far as the eye can see. And the status quo is unacceptable. In my experience, the reason programs fail to replicate is that they were never all that successful in the first place, or because they attempt to replicate a form of a model much less robust than the one they researched.

Generalization can happen. It happens all the time. It has to be planned for, designed for, not just assumed, but it can and does happen. Rather than using failure to replicate as a stick to beat evidence-based policy, let’s agree that we can learn to replicate, and then use every tool at hand to do so. There are so many vulnerable children who need better educations, and we cannot be distracted by arguments that “nothing replicates” that are contradicted by many examples throughout the world.