Where Will the Capacity for School-by-School Reform Come From?

In recent months, I’ve had a number of conversations with state and district leaders about implementing the ESSA evidence standards. To its credit, ESSA diminishes federal micromanaging, and gives more autonomy to states and locals, but now that the states and locals are in charge, how are they going to achieve greater success? One state department leader described his situation in ESSA as being like that of a dog who’s been chasing cars for years, and then finally catches one. Now what?

ESSA encourages states and local districts to help schools adopt and effectively implement proven programs. For school improvement, portions of Title II, and Striving Readers, ESSA requires use of proven programs. Initially, state and district folks were worried about how to identify proven programs, though things are progressing on that front (see, for example, www.evidenceforessa.org). But now I’m hearing a lot more concern about capacity to help all those individual schools do needs assessments, select proven programs aligned with their needs, and implement them with thought, care, and knowledgeable application of implementation science.

I’ve been in several meetings where state and local folks ask federal folks how they are supposed to implement ESSA. “Regional educational labs will help you!” they suggest. With all due respect to my friends in the RELs, this is going to be a heavy lift. There are ten of them, in a country with about 52,000 Title I schoolwide projects. So each REL is responsible for, on average, five states, 1,400 districts, and 5,200 high-poverty schools. For this reason, RELs have long been primarily expected to work with state departments. There are just not enough of them to serve many individual districts, much less schools.

State departments of education and districts can help schools select and implement proven programs. For example, they can disseminate information on proven programs, make sure that recommended programs have adequate capacity, and perhaps hold effective methods “fairs” to introduce people in their state to program providers. But states and districts rarely have capacity to implement proven programs themselves. It’s very hard to build state and local capacity to support specific proven programs. For example, due to frequent downturns in state or district funding come, the first departments to be cut back or eliminated often involve professional development. For this reason, few state departments or districts have large, experienced professional development staffs. Further, constant changes in state and local superintendents, boards, and funding levels, make it difficult to build up professional development capacity over a period of years.

Because of these problems, schools have often been left to make up their own approaches to school reform. This happened on a wide scale in the NCLB School Improvement Grants (SIG) program, where federal mandates specified very specific structural changes but left the essentials, teaching, curriculum, and professional development, up to the locals. The MDRC evaluation of SIG schools found that they made no better gains than similar, non-SIG schools.

Yet there is substantial underutilized capacity available to help schools across the U.S. to adopt proven programs. This capacity resides in the many organizations (both non-profit and for-profit) that originally created the proven programs, provided the professional development that caused them to meet the “proven” standard, and likely built infrastructure to ensure quality, sustainability, and growth potential.

The organizations that created proven programs have obvious advantages (their programs are known to work), but they also have several less obvious advantages. One is that organizations built to support a specific program have a dedicated focus on that program. They build expertise on every aspect of the program. As they grow, they hire capable coaches, usually ones who have already shown their skills in implementing or leading the program at the building level. Unlike states and districts that often live in constant turmoil, reform organizations or for-profit professional development organizations are likely to have stable leadership over time. In fact, for a high-poverty school engaged with a program provider, that provider and its leadership may be the only partner stable enough to be likely to be able to help them with their core teaching for many years.

State and district leaders play major roles in accountability, management, quality assurance, and personnel, among many other issues. With respect to implementation of proven programs, they have to set up conditions in which schools can make informed choices, monitor the performance of provider organizations, evaluate outcomes, and ensure that schools have the resources and supports they need. But truly reforming hundreds of schools in need of proven programs one at a time is not realistic for most states and districts, at least not without help. It makes a lot more sense to seek capacity in organizations designed to provide targeted professional development services on proven programs, and then coordinate with these providers to ensure benefits for students.

This blog is sponsored by the Laura and John Arnold Foundation


Evidence and Freedom

One of the strangest arguments I hear against evidence-based reform in education is that encouraging or incentivizing schools to use programs or practices proven to work in rigorous experiments will reduce the freedom of schools to do what they think is best for their students.

Freedom? Really?

To start with, consider how much freedom schools have now. Many districts and state departments of education have elaborate 100% evidence-free processes of restricting the freedom of schools. They establish lists of approved providers of textbooks, software, and professional development, based perhaps on state curriculum standards but also on current trends, fads, political factors, and preferences of panels of educators and other citizens. Many states have textbook adoption standards that consider paper weight, attractiveness, politically correct language, and other surface factors, but never evidence of effectiveness. Federal policies specify how teachers should be evaluated, how federal dollars should be utilized, and how students should be assessed. I could go on for more pages than anyone wants to read with examples of how teachers’ and principals’ choices are constrained by district, state, and federal policies, very few of which have ever been tested in comparison to control groups. Why do schools use this textbook or that software or the other technology? Because their district or state bought it for them, trained them in its use (perhaps), and gave them no alternative.

The evidence revolution offers the possibility of freedom, if the evidence now becoming widely available is used properly. The minimum principle of evidence-based reform should be this: “If it is proven to work, you are allowed to use it.”

At bare minimum, evidence of effectiveness should work as a “get out of jail free” card to counter whatever rules, restrictions, or lists of approved materials schools have been required to follow.

But permission is not enough, because mandated, evidence-free materials, software, and professional development may eat up the resources needed to implement proven programs. So here is a slightly more radical proposition: “Whenever possible, school staffs should have the right, by majority vote of the staff, to adopt proven programs to replace current programs mandated by the district or state.”

For example, when a district or state requires use of anything, it could make the equivalent in money available to schools to use to select and implement programs proven to be effective in producing the desired outcome. If the district adopts a new algebra text or elementary science curriculum, for instance, it could allow schools to select an alternative with good evidence of effectiveness for algebra or elementary science, as long as the school agrees to implement the program with fidelity and care, achieving levels of implementation like those in the research that validated the program.

The next level of freedom to choose what works would be to provide incentives and support for schools that select proven programs and promise to implement them with fidelity.

“Schools should be able to apply for federal, state, or local funds to implement proven programs of their choice. Alternatively, they may receive competitive preference points on grants if they promise to adopt and effectively implement proven programs.”

This principle exists today in the Every Student Succeeds Act (ESSA), where schools applying for school improvement funding must select programs that meet one of three levels of evidence: strong (at least one randomized experiment with positive outcomes), moderate (at least one quasi-experimental [matched] study with positive outcomes), or promising (at least one correlational study with positive outcomes). In seven other programs in ESSA, schools applying for federal funds receive extra competitive preference points on their applications if they commit to using programs that meet one of those three levels of evidence. The principle in ESSA – that use of proven programs should be encouraged – should be expanded to all parts of government where proven programs exist.

One problem with these principles is that they depend on having many proven programs in each area from which schools can choose. At least in reading and math, grades K-12, this has been accomplished; our Evidence for ESSA website describes approximately 100 programs that meet the top three ESSA evidence standards. More than half of these meet the “strong” standard.

However, we must have a constant flow of new approaches in all subjects and grade levels. Evidence-based policy requires continuing investments in development, evaluation, and dissemination of proven programs. The Institute of Education Sciences (IES), the Investing in Innovation (i3) program, and now the Education Innovation and Research (EIR) grant program, help fulfill this function, and they need to continue to be supported in their crucial work.

So is this what freedom looks like in educational innovation? I would argue that it does. Note that what I did not say is that programs lacking evidence should be forbidden. Mandating use of programs, no matter how well evaluated, is a path to poor implementation and political opposition. Instead, schools should have the opportunity and the funding to adopt proven programs. If they prefer not to do so, that is their choice. But my hope and expectation is that in a political system that encourages and supports use of proven programs, educators will turn out in droves to use better programs, and the schools that might have been reluctant at first will see and emulate the success their neighbors are having.

Freedom to use proven programs should help districts, states, and the federal government have confidence that they can at long last stop trying to micromanage schools. If policymakers know that schools are making good choices and getting good results, why should they want to get in their way?

Freedom to use whatever is proven to enhance student learning. Doesn’t that have a nice ring to it? Like the Liberty Bell?

This blog is sponsored by the Laura and John Arnold Foundation

You Can Step Twice in the Same River: Systems in Education

You can never step twice in the same river.  At least that is what Greek philosopher Heraclitus said a long time ago, when Socrates was just a pup.  What he meant, of course, was that a river is constantly changing, for reasons large and small, so the river you waded across yesterday, or even a minute ago, is not the same one you wade in now.

This proposition is both obvious and wrong.  Sure, rivers are never 100% the same.  But does it matter?  Imagine, for example, that you somehow drained all the water out of a river.  Within a few days or weeks, it would entirely revive itself.  The reason is that a river is not a “thing.”  It is a system.  In other words, a river exists because there is a certain level of rainfall or groundwater or water from upstream, and then a certain topography (rivers are in low-lying areas, compared to surrounding land).  Those factors create the river, and as long as they exist, the river exists.  So when you wade into a river, you are wading into a system, and (sorry, Heraclitus) it is always the same system, because even if the river is higher or lower or muddier or clearer than usual, the system is always the same, unless something pretty dramatic happens upstream.

So why am I rattling on about rivers?  The point I hope to make is that genuine and lasting change in a school depends on changing the system in which the school operates, not just small parts of the school that will be swept away if the system stays unchanged.

Here’s what I mean from an education reform perspective.  Teachers’ daily practices in classrooms are substantially determined by powerful systems.  Whatever innovations you introduce in a school, no matter how effective in the short term, will be eliminated and forgotten if the rest of the system does not change.  For example, if a school implements a great new math program but does not solve classroom management or attendance problems, the school may not maintain its math reform.  Lasting change in math, for example, might require attending to diversity in achievement levels by providing effective tutoring or small-group assistance.  It might require providing eyeglasses to children who need them.  It might require improving reading performance as well as math.  It might require involving parents.  It might require constant monitoring of students’ math performance and targeted responses to solve problems.  It might require recruiting volunteers, or making good use of after school or summer time.  It might require mobilizing department heads or other math leaders within the school to support implementation, and to help maintain the effective program when (predictable) turmoil threatens it.  Policy changes at the district, state, and national levels may also help, but I’m just focusing for the moment on aspects of the system that an individual school or district can implement on its own.  Attending to all of these factors at once may increase the chances that in five or ten years, the effective program remains in place and stays effective, even if the original principal, department head, teachers, and special funds are no longer at the school.

It’s not that every school has to do all of these things to improve math performance over time, but I would argue that lasting impact will depend on some constellation of supports that change the system in which the math reform operates.  Otherwise, the longstanding system of the school will return, washing away the reform and taking the school back to its pre-reform behaviors and policies.

A problem in all of this is that educational development and research often work against systemic change.  In particular, academic researchers are rewarded for publishing articles, and it helps if they evaluate approaches that purely represent a given theory.  Pragmatically, an approach with many components may be more expensive and more difficult to put in place.  As a result, a lot of proven programs available to educators are narrow, focused on the main objective but not on the broader system of the school.  This may be fine in the short run, but in the long run the narrowly focused treatment may not maintain over time.

Seen as a system, a river will never change its course until the key elements that determine its course themselves change.  Unless that happens, we’ll always be stepping into the same river, over and over again, and getting the same results.


Transforming Transformation (and Turning Around Turnaround)

At the very end of the Obama Administration, the Institute for Education Sciences (IES) released the final report of an evaluation of the outcomes of the federal School Improvement Grant program. School Improvement Grants (SIG) are major investments to help schools with the lowest academic achievement in their states to greatly improve their outcomes.

The report, funded by the independent and respected IES and carried out by the equally independent and respected Mathematica Policy Associates, found that SIG grants made essentially no difference in the achievement of the students in schools that received them.


In Baltimore, where I live, we believe that if you spend $7 billion on something, as SIG has so far, you ought to have something to show for it. The disappointing findings of the Mathematica evaluation are bad news for all of the usual reasons. Even if there were some benefits, SIG turned out to be a less-than-compelling use of taxpayers’ funds.  The students and schools that received it really needed major improvement, but improved very little. The findings undermine faith in the ability of very low-achieving schools to turn themselves around.

However, the SIG findings are especially frustrating because they could have been predicted, were in fact predicted by many, and were apparent long before this latest report. There is no question that SIG funds could have made a substantial difference. Had they been invested in proven programs and practices, they would have surely improved student outcomes just as they did in the research that established the effectiveness of the proven programs.

But instead of focusing on programs proven to work, SIG forced schools to choose among four models that had never been tried before and were very unlikely to work.

Three of the four models were so draconian that few schools chose them. One involved closing the school, and another, conversion to a charter school. These models were rarely selected unless schools were on the way to doing these things anyway. Somewhat more popular was “turnaround,” which primarily involved replacing the principal and 50% of the staff. The least restrictive model, “transformation,” involved replacing the principal, using achievement growth to evaluate teachers, using data to inform instruction, and lengthening the school day or year.

The problem is that very low achieving schools are usually in low achieving areas, where there are not long lines of talented applicants for jobs as principals or teachers. A lot of school districts just swapped principals between SIG and non-SIG schools. None of the mandated strategies had a strong research base, and they still don’t. Low achieving schools usually have limited capacity to reform themselves under the best of circumstances, and SIG funding required replacing principals, good or bad, thereby introducing instability in already tumultuous places. Further, all four of the SIG models had a punitive tone, implying that the problem was bad principals and teachers. Who wants to work in a school that is being punished?

What else could SIG have done?

SIG could have provided funding to enable low-performing schools and their districts to select among proven programs. This would have maintained an element of choice while ensuring that whatever programs schools chose would have been proven effective, used successfully in other low-achieving schools, and supported by capable intermediaries willing and able to work effectively in struggling schools.

Ironically, SIG did finally introduce such an option, but it was too little, too late.  In 2015, SIG introduced two additional models, one of which was an Evidence-Based, Whole-School Reform model that would allow schools to utilize SIG funds to adopt a proven whole-school approach. The U.S. Department of Education carefully reviewed the evidence and identified four approaches with strong evidence and the ability to expand that could be utilized under this model. But hardly any schools chose to utilize these approaches because there was little promotion of the new models, and few school, district, or state leaders to this day even know they exist.

The old SIG program is changing under the Every Student Succeeds Act (ESSA). In order to receive school improvement funding under ESSA, schools will have to select from programs that meet the strong, moderate, or promising evidence requirements defined in ESSA. Evidence for ESSA, the free web site we are due to release later this month, will identify more than 90 reading and math programs that meet these requirements.

This is a new opportunity for federal, state, and district officials to promote the use of proven programs and build local capacity to disseminate proven approaches. Instead of being seen as a trip to the woodshed, school improvement funding might be seen as an opportunity for eager teachers and administrators to do cutting edge instruction. Schools using these innovative approaches might become more exciting and fulfilling places to work, attracting and retaining the best teachers and administrators, whose efforts will be reflected in their students’ success.

Perhaps this time around, school improvement will actually improve schools.


Perfect Implementation of Hopeless Methods: The Sinking of the Vasa

If you are ever in Stockholm, you must visit the Vasa Museum. It contains a complete warship launched in 1628 that sank 30 minutes later. Other than the ship itself, the museum contains objects and bones found in the wreck, and carefully analyzed by scientists.

The basic story of the sinking of the Vasa has important analogies to what often happens in education reform.

After the Vasa sank, the king, who commissioned it, Gustav II Adolphe, called together a commission to find out whose fault it was, to punish the guilty.

Yet the commission, after many interviews with survivors, found that no one did anything wrong. 3 ½ centuries later, modern researchers came to the same conclusion. Everything was in order. The skeleton of the helmsman was found still gripping the steering pole, trying heroically to turn the ship’s bow into the wind to keep it from leaning over.

So what went wrong? The ship could never have sailed. It was built too top-heavy, with too much heavy wood and too many heavy guns on the top decks and too little ballast on the bottom. The Vasa was doomed, no matter what the captain and crew did.

In education reform, there is a constant debate about how much is contributed to effectiveness by a program as opposed to quality of implementation. In implementation science, there are occasionally claims that it does not matter what programs schools adopt, as long as they implement them well. But most researchers, developers, and educators agree that success only results from a combination of good programs and good implementation. Think of the relationship as multiplicative:

P X I = A

(Quality of program times quality of implementation equals achievement gain).

The reason the relationship might be multiplicative is that if either P or I is zero, achievement gain is zero. If both are very positive, then achievement gain is very, very positive.

In the case of the Vasa, P=0, so no matter how good implementation was, the Vasa was doomed. In many educational programs, the same is true. For example, programs that are not well worked out, not well integrated into teachers’ schedules and skill sets, or are too difficult to implement, are unlikely to work. One might argue that in order to have positive effects, a program must be very clear about what teachers are expected to do, so that professional development and coaching can be efficiently targeted to helping teachers do those things. Then we have to have evidence that links teachers’ doing certain things to improving student learning. For example, providing teachers with professional development to enhance their content knowledge may not be helpful if teachers are not clear how to put this new knowledge into their daily teaching.

Rigorous research, especially under funding from IES and i3 in the U.S. and from EEF in England, is increasingly identifying proven programs as well as programs that consistently fail to improve student outcomes. The patterns are not perfectly clear, but in general those programs that do make a significant difference are ones that are well-designed, practical, and coherent.

If you think implementation alone will carry the day, keep in mind the skeleton of the heroic helmsman of the Vasa, spending 333 years on the seafloor trying to push the Vasa’s bow into the wind. He did everything right, except for signing on to the wrong ship.


What Schools in One Place Can Learn from Schools Elsewhere

In a recent blog, I responded to an article by Lisbeth Schorr and Srik Gopal about their concerns that the findings of randomized experiments will not generalize from one set of schools to another. I got a lot of supportive response to the blog, but I realize that I left out a key point.

The missing point was this: the idea that effective programs readily generalize from one place to another is not theoretical. It happens all the time. I try to avoid talking about our own programs, but in this case, it’s unavoidable. Our Success for All program started almost 30 years ago, working with African American students in Baltimore. We got terrific results with those first schools. But our first dissemination schools beyond Baltimore included a Philadelphia school primarily serving Cambodian immigrants, rural schools in the South, small town schools in the Midwest, and so on. We had to adapt and refine our approaches for these different circumstances, but we found positive effects across a very wide range of settings and circumstances. Over the years, some of our most successful schools have been ones serving a Native Americans, such as a school in the Arizona desert and a school in far northern Quebec. Another category of schools where we see outstanding success is ones serving Hispanic students, including English language learners, as in the Alhambra district in Phoenix and a charter school near Los Angeles. One of our most successful districts anywhere is in small-city Steubenville, Ohio. We have established a successful network of SFA schools in England and Wales, where we have extraordinary schools primarily serving Pakistani, African, and disadvantaged White students in a very different policy context from the one we face in the U.S. And yes, we continue to find great results in Baltimore and in cities that resemble our original home, such as Detroit.

The ability to generalize from one set of schools to others is not at all limited to Success for All. Reading Recovery, for example, has had success in every kind of school, in countries throughout the world. Direct Instruction has also been successful in a wide array of types of schools. In fact, I’d argue that it is rare to find programs that have been proven to be effective in rigorous research that then fail to generalize to other schools, even ones that are quite different. Of course, there is great variation in outcomes in any set of schools using any innovative program, but that variation has to do with leadership, local support, resources, and so on, not with a fundamental limitation on generalizability to additional populations.

How is it possible that programs initially designed for one setting and population so often generalize to others? My answer would be that in most fundamental regards, the closer you get to the classroom, the more schools begin to resemble each other. Individual students do not all learn the same way, but every classroom contains a range of students who have a predictable set of needs. Any effective program has to be able to meet those needs, wherever the school happens to be located. For example, every classroom has some number of kids who are confident, curious, and capable, some number who are struggling, some number who are shy and quiet, some number who are troublemakers. Most contain students who are not native speakers of English. Any effective program has to have a workable plan for each of these types of students, even if the proportions of each may vary from classroom to classroom and school to school.

There are reasonable adaptations necessary for different school contexts, of course. There are schools where attendance is a big issue and others where it can be assumed, schools where safety is a major concern and others where it is less so. Schools in rural areas have different needs from those in urban or suburban ones, and obviously schools with many recent immigrants have different needs from those in which all students are native speakers of English. Involving parents effectively looks different in different places, and there are schools in which eyeglasses and other health concerns can be assumed to be taken care of and others where they are major impediments to success. But after the necessary accommodations are made, you come down to a teacher and twenty to thirty children who need to be motivated, to be guided, to have their individual needs met, and to have their time used to greatest effect. You need to have an effective plan to manage diverse needs and to inspire kids to see their own possibilities. You need to fire children’s imaginations and help them use their minds well to write and solve problems and imagine their own futures. These needs exist equally in Peru and Poughkeepsie, in the Arizona desert or the valleys of Wales, in Detroit or Eastern Kentucky, in California or Maine.

Disregarding evidence from randomized experiments because it does not always replicate is a recipe for the status quo, as far as the eye can see. And the status quo is unacceptable. In my experience, the reason programs fail to replicate is that they were never all that successful in the first place, or because they attempt to replicate a form of a model much less robust than the one they researched.

Generalization can happen. It happens all the time. It has to be planned for, designed for, not just assumed, but it can and does happen. Rather than using failure to replicate as a stick to beat evidence-based policy, let’s agree that we can learn to replicate, and then use every tool at hand to do so. There are so many vulnerable children who need better educations, and we cannot be distracted by arguments that “nothing replicates” that are contradicted by many examples throughout the world.


An Exploded View of Comprehensive School Reform

Recently, I had to order a part for an electric lawnmower. I enjoyed looking at the exploded view (similar to the one above) on the manufacturer’s web site. What struck me about it was that so many of the parts were generic screws, bolts, springs, wheels, and so on. With a bit of ingenuity, I’m sure someone (not me!) could track down generic electric motors, mower blades, and other more specialized parts, and build their very own do-it-yourself lawn mower.

There are just a few problems with this idea.

  1. It would cost a lot more than the original mower
  2. It would take a lot of time that could possibly be used for better purposes
  3. It wouldn’t work and you’d end up with an expensive pile of junk to discard.

Why am I yammering on about exploded views of lawn mowers? Because the idea of assembling lawn mowers from generic parts is a lot like what all too many struggling schools do in the name of whole school reform.

In education, the equivalent do-it-yourself idea using generic parts is the idea that if you choose one program for reading and another for behavior and a third for parent involvement and a fourth for tutoring and a fifth for English learners and a sixth for formative assessment and a seventh for coaching, the school is bound to do better. It might, but this piecemeal approach is really hard to do well.

The alternative to assembling all of those generic parts is to adopt a comprehensive school improvement model. These are models that have coordinated, well worked-out, well-supported approaches to increasing student success. Our own Success for All program is one of them, but there are others for elementary and secondary schools. After years of encouraging schools receiving School Improvement Grants (SIG) to assemble their own comprehensive reforms (remember the lawn mower?), the U.S. Department of Education finally offered SIG schools the option of choosing a proven whole-school approach. In addition to our Success for All program, the U.S. Department of Education approved three other comprehensive programs based on their evidence of effectiveness: Positive Action, the Institute for Student Achievement, and New York City’s small high schools of choice approach. These all met the Department’s standards because they had at least one randomized experiment showing positive outcomes on achievement measures, but some had a lot more evidence than that.

Comprehensive approaches resemble the fully assembled lawn mower rather than the DIY exploded view. The parts of the comprehensive models may be like those of the do-it-yourself SIG models, but the difference is that the comprehensive models have a well-thought-out plan for coordinating all of those elements. Also, even if a school used proven elements to build its own model, those elements would not have been proven in combination, and each might compete for the energies, enthusiasm, and resources of the beleaguered school staff.

This is the last year of SIG under the old rules, but it will continue in a different form under ESSA. The ESSA School Improvement provisions require the use of programs that meet strong, moderate, or promising evidence standards. Assembling individual proven elements is not a terrible idea, and is a real improvement on the old SIG because it at least requires evidence for some of the parts. But perhaps broader use of comprehensive programs with strong evidence of effectiveness for the whole school-wide approach, not just the parts, will help finally achieve the bold goals of school improvement for some of the most challenging schools in our country.