Large-Scale Tutoring Could Fail. Here’s How to Ensure It Does Not.

I’m delighted to see that the idea of large-scale tutoring to combat Covid-19 losses has gotten so important in the policy world that it is attracting scoffers and doubters. Michael Goldstein and Bowen Paulle (2020) published five brief commentaries recently in The Gadfly, warning about how tutoring could fail, both questioning the underlying research on tutoring outcomes (maybe just publication bias?) and noting the difficulties of rapid scale up. They also quote without citation a comment by Andy Rotherham, who quite correctly notes past disasters when government has tried and failed to scale up promising strategies: “Ed tech, class size reduction, teacher evaluations, some reading initiatives, and charter schools.” To these, I would add many others, but perhaps most importantly Supplementary Educational Services (SES), a massive attempt to implement all sorts of after school and summer school programs in high-poverty, low-achieving schools, which had near-zero impact, on average.

So if you were feeling complacent that the next hot thing, tutoring, was sure to work, no matter how it’s done, then you have not been paying attention for the past 30 years.

But rather than argue with these observations, I’d like to explain that the plan I’ve proposed, which you will find here, is fundamentally different from any of these past efforts, and if implemented as designed, with adequate funding, is highly likely to work at scale.

1. Unlike all of the initiatives Rotherham dismisses, unlike SES, unlike just about everything ever used at scale in educational policy, the evidence base for certain specific, well-evaluated programs is solid. And in our plan, only the proven programs would be scaled.

A little known but crucial fact: Not all tutoring programs work. The details matter. Our recent reviews of research on programs for struggling readers (Neitzel et al., in press) and math (Pellegrini et al., in press) identify individual tutoring programs that do and do not work, as well as types of tutoring that work well and those that do not.

Our scale-up plan would begin with programs that already have solid evidence of effectiveness, but it would also provide funding and third-party, rigorous evaluations of scaled-up programs without sufficient evidence, as well as new programs, designed to add additional options for schools. New and insufficiently evaluated programs would be piloted and implemented for evaluation, but they would not be scaled up unless they have solid evidence of effectiveness in randomized evaluations.

If possible, in fact, we would hope to re-evaluate even the most successful evaluated programs, to make sure they work.

If we stick to repeatedly-proven programs, rigorously evaluated in large randomized experiments, then who cares whether other programs have failed in the past? We will know that the programs being used at scale do work. Also, all this research would add greatly to knowledge about effective and ineffective program components and applications to particular groups of students, so over time, we’d expect the individual programs, and the field as a whole, to gain in the ability to provide proven tutoring approaches at scale.

2. Scale-up of proven programs can work if we take it seriously. It is true that scale-up has many pitfalls, but I would argue that when scale-up does not occur it is for one of two reasons. First, the programs being scaled were not adequately proven in the first place. Second, the funding provided for scale-up was not sufficient to allow the program developers to scale up under the conditions they know full well are necessary. As examples of the latter, programs that provided well-trained and experienced trainers in their initial studies are often forced by insufficient funding to use trainer-of-trainers models for greatly diminished amounts of training in scale-up. As a result, the programs that worked at small scale failed in large-scale replication. This happens all the time, and this is what makes policy experts conclude that nothing works at scale.

However, the lesson they should have learned instead is just that programs proven to work at small scale can succeed if the key factors that made them work at small scale are implemented with fidelity at large scale. If anything less is done in scale-up, you’re taking big risks.

If well-trained trainers are essential, then it is critical to insist on well-trained trainers. If a certain amount or quality of training is essential, it is critical to insist on it, and make sure it happens in every school using a given program. And so on. There is no reason to skimp on the proven recipe.

But aren’t all these trainers and training days and other elements unsustainable? This is the wrong question. The right one is, how can we make tutoring as effective as possible, to justify its cost?

Tutoring is expensive, but most of the cost is in the salaries of the tutors themselves. As an analogy, consider horse racing. Horse owners pay millions for horses with great potential. Having done so, do you think they skimp on trainers or training? Of course not. In the same way, a hundred teaching assistants tutors cost roughly $4 million per year in salaries and benefits alone. Let’s say top-quality training for this group costs $500,000 per year, while crummy training costs $50,000. If these figures are in the ballpark, would it be wise to spend $4,500,000 on a terrific tutoring program, or $4,050,000 on a crummy one?

Successful scale-up takes place all the time in business. How does Starbucks make sure your experience in every single store is excellent? Simple. They have well-researched, well specified, obsessively monitored standards and quality metrics for every part of their operation. Scale-up in education can work just the same way, and in comparison to the costs of front-line personnel, the costs of great are trivially greater than the cost of crummy.

3. Ongoing research will, in our proposal, formatively evaluate the entire tutoring effort over time, and development and evaluation will continually add new proven programs.

Ordinarily, big federal education programs start with all kinds of rules and regulations and funding schemes, and these are announced with a lot of hoopla and local and national meetings to explain the new programs to local educators and leaders. Some sort of monitoring and compliance mechanism is put in place, but otherwise the program steams ahead. Several years later, some big research firm gets a huge contract to evaluate the program. On average, the result is almost always disappointing. Then there’s a political fight about just how disappointing the results are, and life goes on.

The program we have proposed is completely different. First, as noted earlier, the individual programs that are operating at large scale will all be proven effective to begin with, and may be evaluated and proven effective again, using the same methods as those used to validate new programs. Second, new proven programs would be identified and scaled up all the time. Third, numerous studies combining observations, correlational studies, and mini-experiments would be evaluating program variations and impacts with different populations and circumstances, adding knowledge of what is happening at the chalkface and of how and why outcomes vary. This explanatory research would not be designed to decide which programs work and which do not (that would be done in the big randomized studies), but to learn from practice how to improve outcomes for each type of school and application. The idea is to get smarter over time about how to make tutoring as effective as it can be, so when the huge summative evaluation takes place, there will be no surprises. We would already know what is working, and how, and why.

Our National Tutoring Corps proposal is not a big research project, or a jobs program for researchers. The overwhelming focus is on providing struggling students the best tutoring we know how to provide. But using a small proportion of the total allocation would enable us to find out what works, rapidly enough to inform practice. If this were all to happen, we would know more and be able to do more every year, serving more and more struggling students with better and better programs.

So rather than spending a lot of taxpayer money and hoping for the best, we’d make scale-up successful by using evidence at the beginning, middle, and end of the process, to make sure that this time, we really know what we are doing. We would make sure that effective programs remain successful at scale, rather than merely hoping they will.

References

Goldstein, M., & Paulle, B. (2020, Dec. 8) Vaccine-making’s lessons for high-dosage tutoring, Part 1. The Gadfly.

Goldstein, M., & Paulle, B. (2020, Dec. 11). Vaccine-making’s lessons for high-dosage tutoring, Part IV. The Gadfly.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (in press). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open.

Original photo by Catherine Carusso, Presidio of Monterey Public Affairs

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org.

Scaling Up: Penicillin and Education

In 1928, the Scottish scientist Alexander Fleming invented penicillin. As the story goes, he invented penicillin by accident, when he left a petri dish containing bacteria on his desk overnight and the next morning found that it was infected with rod-shaped organisms that had killed the bacteria. Fleming isolated the rods and recognized that if they could kill bacteria, they might be useful in curing many diseases.

Early on it was clear that penicillin had extraordinary possibilities. In World War I, more soldiers and civilians had been killed by bacterial diseases than were killed by bullets. What if these diseases could be cured? Early tests showed very promising effects.

Yet there was a big problem. No one knew how to produce penicillin in quantity. Very small experiments established that penicillin had potential for curing bacterial infections and was not toxic. However, the total world supply at the onset of World War II was about enough for a single adult. The impending need for penicillin was obvious, but it still was not ready for prime time.

American and British scientists finally began to work together to find a way to scale up production of penicillin. Finally, the Merck Company developed a mass production method, and was making billions of units by D-Day.

The key dynamic of the penicillin story has much in common with an essential problem of education reform. The Merck work did not change the structure of penicillin itself, but Merck scientists did a lot of science and experimentation to find strains that were stable and replicable. In education reform, it is equally the case that the development and initial evaluation of a given program may be a very different process from that intended to carry out large-scale evaluations and scaling up of proven programs.

In some cases, different organizations may be necessary to do large scale evaluation and implementation, as was the case with Merck and Fleming, and in other cases the same organization may carry though the development, initial evaluation, large-scale evaluation, and dissemination. Whoever is responsible for the various steps, their requirements are similar.

At small scale, innovators are likely to work in schools nearby, where they can frequently visit schools, see what is going on, hear teachers’ perspectives, and change strategies in course in response to what is going on. At small scale, programs might vary a great deal from class to class or school to school. Homemade measures, opinions, observations, and other informal indicators may be all developers need or want. From a penicillin perspective, this is still the Fleming level.

When a program moves to the next level, it may be working in many schools or distant locations, and the approach must change substantially. This is the Merck stage of development in penicillin terms. Developers must have a very clear idea of what the program is, and then provide student materials, software, professional development, and coaching directed toward helping teachers to enact the program effectively. Rather than being able to adapt a great deal to the desires or ideas of every school or teacher, principals and teachers can be asked to vote on participation, with an understanding that if they decide to participate, they commit to follow the program more or less as designed, with reasonable variations in light of unique characteristics of the school (e.g., urban/rural, presence of English learners, or substantial poverty). Professional development and coaching need to be standardized, with room for appropriate adaptations. Organizations that provide large-scale services need to learn how to manage functions such as finance, human resources, and IT.

As programs grow, they should seek funding for large-scale, randomized evaluations, ideally by third party evaluators.

In order to get to the Merck level in education reform, we must be ready to build robust, flexible, self-sustaining organizations, capable of ensuring positive impacts of educational programs on a broad scale. Funding from government and private foundations are needed along the way, but the organizations ultimately must be able to operate mostly or entirely on revenues from schools, especially Title I or other funds likely to be available in many or most schools.

Over the years, penicillin has saved millions of lives, due to the pioneering work of Fleming and the pragmatic work of Merck. In the same way, we can greatly enhance the learning of millions of children, combining innovative design and planful, practical scale-up.

Teachers as Professionals in Evidence-Based Reform

In a February 2012 op-ed in Education Week, Don Peurach wrote about a 14-year investigation he carried out as part of a large University of Michigan study of comprehensive school reform. In the overall study, our Success for All program and the America’s Choice program did very well in terms of both implementation and outcomes, while an approach in which teachers largely made up their own instructional approaches did not bring about much change in teachers’ behaviors or student learning. Because both Success for All and America’s Choice have well-specified training, teacher’s manuals, and student materials, the findings support the idea that it is important for school-wide reform models to have a well-structured approach.

Peurach’s focus was on Success for All as an organization. He wanted to know how our network of hundreds of schools in 40 states contributes to the development of the approach and to each other’s success. His key finding was that Success for All is not a top-down approach, but is constantly learning from its teachers and principals and then spreading good practices throughout the network.

In our way of thinking, this is the very essence of professionalism. A teacher who does wonderful, innovative things in one class is perhaps benefiting 25 children each year, but one whose ideas scale up to inform the practices of hundreds of thousands of schools is making a real difference. Yet in order for teachers’ ideas and impact to be broadly impactful, it helps a great deal for the teachers to be part of a national or regional network that speaks a common language and has common standards of practice.

Teachers need not be researchers to contribute to their profession. By participating in networks of like-minded educators – implementing, continuously improving and communicating about practical approaches intended to improve outcomes of proven approaches – they play an essential role in the improvement of their profession.

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Improvement by Design

I just read a very interesting book, Improvement by Design: The Promise of Better Schools, by David Cohen, Donald Peurach, Joshua Glazer, Karen Gates, and Simona Goldin. From 1996 to 2008, researchers originally at the University of Michigan studied three of the largest comprehensive school reform models of the time: America’s Choice (AC), Accelerated Schools Plus (ASP), and our own Success for All (SFA). A portion of the study, led by Brian Rowan, compared 115 elementary schools using one of these models to a matched control group and to each other. The quantitative study found that Success for All had strong impacts on reading achievement by third grade, America’s Choice had strong impacts on writing, and there were few impacts of Accelerated Schools Plus.

Improvement by Design tells a different story, based on qualitative studies of the three organizations over a very long time period. Despite sharp differences between the models, all of the organizations had to face a common set of challenges: creating viable models and organizations to support them, dealing with rapid scale-up through the 1990s (especially during the time period from 1997 to 2002 when Obey-Porter Comprehensive School Reform funding was made available to schools), and then managing catastrophe when the George W. Bush Administration ended comprehensive school reform.

The book is straightforward history, comparing and contrasting these substantial reform efforts, and does not directly draw policy conclusions. However, there is much in it that does have direct policy consequences. These are my conclusions, not the authors’, but I think they are consistent with the history.

1. Large-scale change that dramatically changes daily teaching is difficult but not impossible in high-poverty schools. All three models have worked in hundreds of schools, as have several other whole-school reform models.

2. Providing general principles and then leaving schools to create the details for themselves is not a successful strategy. This is what Accelerated Schools Plus tried to do, and the Michigan study not only found that ASP failed to change student outcomes, but also that it failed to have much observable impact on teaching, in contrast to AC and SFA.

3. What (2) implies is that if whole-school “improvement by design” is to succeed in the thousands of Title I schools that need it, large, well-managed, and well-capitalized organizations are necessary to provide high-quality and very specific training, coaching, and materials to implement proven models.

4. Federal policies (at least) need to be consistently hospitable to an environment in which schools and districts are choosing among many proven whole-school models. For example, federal requests for proposals might have a few competitive preference points for schools proposing to use whole-school reform models with strong evidence of effectiveness. This would signal an invitation to adopt such models without forcing schools to do so and risking extensive pushback. Further, federal policies promoting use of proven whole-school models should remain in effect for an extended period. Turmoil introduced by changing federal support for whole-school reform was very damaging to earlier efforts.

Improvement by Design provides a tantalizing glimpse of what could be possible in a system that encourages a diversity of proven, whole-school options to high-poverty schools. This approach to reform has many obstacles to overcome, of course. But for what approach radical enough and scalable enough to potentially reform American education would this not be true?

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Success in Evidence-Based Reform: The Importance of Failure

As always, Winston Churchill said it best: “Success consists of going from failure to failure without loss of enthusiasm.” There is a similar Japanese saying: “Success is being knocked down seven times and getting up eight.”

These quotes came to my mind while I was reading a recently released report from the Aspen Institute, “Leveraging Learning: The Evolving Role of Federal Policy in Education Research.” The report is a useful scan of the education research horizon, intended as background for the upcoming reauthorization of the Education Sciences Reform Act (ESRA), the legislation that authorizes the Institute of Education Sciences (IES). However, the report also contains brief chapters by various policy observers (including myself), focusing on how research might better inform and improve practice and outcomes in education. A common point of departure in some of these was that while randomized experiments (RCTs) emphasized for the past decade by IES and, more recently, Investing in Innovation (i3), are all well and good, the IES experience is that most randomized experiments evaluating educational programs find few achievement effects. Several cited testimony by Jon Baron that “of the 90 interventions evaluated in randomized trials by IES, 90% were found to have weak or no positive effects.” As a response, the chapter authors proposed various ways in which IES could add to its portfolio more research that is not RCTs.

Within the next year or two, the problem Baron was reporting will take on a great deal of importance. The results of the first cohort of Investing in Innovation grants will start being released. At the same time, additional IES reports will appear, and the Education Endowment Foundation (EEF) in the U.K., much like i3, will also begin to report outcomes. All four of the first cohort of scale-up programs funded by i3 (our Success for All program, Reading Recovery, Teach for America, and KIPP) have had positive first-year findings in i3 or similar evaluations recently, but this is not surprising, as they had to pass a high evidence bar to get scale-up funding in the first place. The much larger number of validation and development projects were not required to have such strong research bases, and many of these are sure to show no effects on achievement. Kevan Collins, Director of the EEF, has always openly said that he’d be delighted if 10% of the studies EEF has funded find positive impacts. Perhaps in the country of Churchill, Collins is better placed to warn his countrymen that success in evidence-based reform is going to require some blood, sweat, toil, and tears.

In the U.S., I’m not sure if policymakers or educators are ready for what is about to happen. If most i3 validation and development projects fail to produce significant positive effects in rigorous, well-conducted evaluations, will opinion leaders celebrate the programs that do show good outcomes and value the knowledge gained from the whole process, including knowledge about what almost worked and what to avoid doing next time? Will they support additional funding for projects that take these learnings into account? Or will they declare the i3 program a failure and move on to the next set of untried policies and practices?

I very much hope that i3 or successor programs will stay the course, insisting on randomized experiments and building on what has been learned. Even if only 10% of validation and development projects report clear, positive achievement outcomes and capacity to go to scale, there will be many reasons to celebrate and stay on track:

1. There are currently 112 i3 validation and development projects (plus 5 scale-ups). If just 10% of these were found to be effective and scalable, that would be 11 new programs. Adding this to the scale-up programs and other programs already positively reviewed in the What Works Clearinghouse, this would be a substantial base of proven programs. In medicine, the great majority of treatments initially evaluated are found not to be effective, yet the medical system of innovation works because the few proven approaches make such a big difference. Failure is fine if it leads to success.

2. Among the programs that do not produce statistically significant positive outcomes on achievement measures, there are sure to be many that show promise but do not quite reach significance. For example, any program whose evaluation shows a student-level positive effect size of, say, +0.15 or more should be worthy of additional investment to refine and improve its procedures and its evaluation to reach a higher standard, rather than being considered a bust.

3. The i3 process is producing a great deal of information about what works and what does not, what gets implemented and what does not, and the match between schools’ needs and programs’ approaches. These learnings should contribute to improvements in new programs, to revisions of existing programs, and to the policies applied by i3, IES, and other funders.

4. As the findings of the i3 and IES evaluations become known, program developers, grant reviewers, and government leaders should get smarter about what kinds of approaches are likely to work and to go to scale. Because of this, one might imagine that even if only 10% of validation and development programs succeed in RCTs today, higher and higher proportions will succeed in such studies in the future.

Evidence-based reform, in which promising scalable approaches are ultimately evaluated in RCTs or similarly rigorous evaluations, is the best way to create substantial and lasting improvements in student achievement. Failures of individual evaluations or projects are an expected, even valued part of the process of research-based reform. We need to be prepared for them, and to celebrate the successes and the learnings along the way.

As Churchill also said, “Success is not final, failure is not fatal; It is the courage to continue that counts.”

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Can Educational Innovations Go To National Scale?

In conversations about evidence-based reform, I often hear the objection that “we don’t really know how to take proven innovations to scale” or that “in order for schools or districts to adopt innovations, they must have a central role in creating and disseminating them locally.”

These assumptions turn out to be false. There are in fact many instances in which programs not developed by the educators using them have been widely and enthusiastically adopted by schools all over the U.S.

National Diffusion Network (NDN)
First, there was the National Diffusion Network (NDN). In 1979-1996, NDN invited program developers of all kinds to be reviewed by a Joint Dissemination Review Panel, which certified the program’s effects, likelihood of going to scale, and practical utility.

The program made “developer-dissemination” grants (at about $25,000 per year) to developers of promising programs. State facilitators were established in each state to promote the use of the appropriate programs. By the end of the NDN funding, thousands of schools were using one of more than 500 programs.

Comprehensive School Reform (CSR)
Beginning in 1991, a coalition of large corporations established New American Schools (NAS) to help fund innovators to create comprehensive whole-school reform models. Out of 700 applications, 11 were initially selected, and 7 of these were maintained after initial testing. These models began to be used in hundreds of school collectively. NAS helped identify target districts in which they held “effective methods fairs.” Hundreds of principals, teachers, and school board members came to learn about the models. They could ask representatives of one or more models to present at their schools. They then had a chance to contract with the models they chose. Starting in 1998, the Obey-Porter Act in Congress established incentive funding of at least $50,000 per year for three years for schools to implement comprehensive school reforms of their choice. This caused an outpouring of interest both in the NAS models and in others that were assembled to resemble NAS models. Within a few years, there were more than 2500 Title I schools receiving CSR funding and another 3500 schools adopting these models without CSR funding, mostly using existing Title I funds.

Evaluations of the CSR models began in the 1990s and continued into the early 2000s. They found consistent positive effects for some of the programs, especially the Comer School Development Program, America’s Choice, Modern Red Schoolhouse, and our Success for All program. Obey-Porter funding ended in 2003, but many of the school programs continued without Obey-Porter for many years, up to the present.

Investing in Innovation (i3)
The election of Barack Obama in 2008 brought in an administration eager to expand the use of research-proven programs in education and other fields. In a program called Investing in Innovation (i3), $650 million was set aside to fund educational programs in one of three categories: scale-up, validation, or development. To qualify for scale-up grants, programs had to have strong, positive, replicated outcomes in rigorous evaluations. Validation required a single positive study, and development grants only required a strong theory of action. Scale-up grantees received $50 million over five years to evaluate and scale-up their reforms, while validation projects received $30 million and development projects $5 million. A total of 47 programs, including 4 scale-up projects (Success for All, Reading Recovery, KIPP, and Teach for America) received funding in the first round. In years after the first, annual i3 funding was dropped to $150 million, and grants in each category were cut in half. After four rounds of funding, 77 development, 35 validation, and 5 scale-up projects have been funded. It is too early to say how these grants will work, but scale-up and validation projects are working in hundreds of additional schools under i3 funding and are developing capacity to do more. All of the programs will be rigorously evaluated by third-party evaluators.

NDN, CSR, and i3 have established beyond any doubt that:

1. With encouragement and modest funding, thousands of schools will eagerly adopt research-based programs.
2. Organizations willing and able to support school adoptions nationally will come forward and operate effectively if government helps schools with initial funding barriers.
3. Many whole-school reform models have developed strong evidence of effectiveness, but a strong evidence base without government encouragement and incentives does not lead to robust adoptions.
4. The idea that whole-school reforms must be created by the schools that use them has clearly been disproved. Schools are willing and able to adopt proven programs developed elsewhere if they can afford them.

As reforms in federal education programs such as Title I, School Improvement Grants, and Race to the Top go forward, it makes sense to continue to develop, evaluate, and disseminate whole-school reform models. This approach can expand rapidly while maintaining quality at scale and can improve outcomes for millions of disadvantaged children.

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Lessons from Innovators: Reading Recovery

The process of moving an educational innovation from a good idea to widespread effective implementation is far from straightforward, and no one has a magic formula for doing it. The William T. Grant and Spencer Foundations, with help from the Forum for Youth Investment, have created a community composed of grantees in the federal Investing in Innovation (i3) program to share ideas and best practices. Our Success for All program participates in this community. In this space, I, in partnership with the Forum for Youth Investment, highlight observations from the experiences of i3 grantees other than our own, in an attempt to share the thinking and experience of colleagues out on the front lines of evidence-based reform.

This blog is based on an interview between the Forum for Youth Investment and Jerry D’Agostino, Professor of Education at the Ohio State University and Director of Reading Recovery’s i3 project. A persistent challenge for programs that have scaled up is how to sustain for the long term. In this interview, D’Agostino shares how this long-standing literacy intervention has dealt with the challenge and how it has reinvented itself over the years in order to stay current.

Stay Fresh
Reading Recovery is a research-based, short-term intervention that involves one-to-one teaching for the lowest-achieving first graders. It began in New Zealand in the 1970’s but has been in operation in the United States for 30 years and has spread across the country. Over the years, Reading Recovery has expanded and contracted depending on funding, interest from school districts, and our capacity. Today there are training centers at 19 universities that equip teachers to deliver the intervention and the program has a presence in some 8,000 schools across 49 states. With that kind of scale and longevity, it can be easy to become complacent and assume the intervention speaks for itself. D’Agostino says just the opposite is true. “We know that being the old brand that has been around for a long time can be hard,” he notes. “You have to think about how to keep the brand fresh. Superintendents want the newest hot thing. Teachers have to know it will work with their kids in their classrooms. We have spent time focused on how to adjust the model to offer new features and respond to current education trends such as the Common Core. You always have to show teachers and administrators how the intervention addresses the issue of the day. For example, it isn’t enough that the intervention produces strong effect sizes. For teachers, that is a meaningless number. They want to know that the program will help their third graders achieve the literacy level now required in nearly 40 states to be promoted to 4th grade.”

Be Flexible but Maintain Your Core
Reading Recovery has taken seriously the idea of identifying the intervention’s core elements and also responding to the educational system’s current needs. They know that one-to-one instruction and 30-minute daily lessons are non-negotiable, but they also recognize that adaptations are needed. For example, innovations in the lesson framework have resulted in a design for classroom instruction (Literacy Collaborative), small groups (Comprehensive Intervention Model), and training for special education and ESL teachers (Literacy Lessons). “Our innovations have come as direct requests from schools,” says D’Agostino. “For example, a school says they need something for English Language Learners and we develop something new for that one school that then becomes a part of our overall product line. It allows growth for Reading Recovery and flexibility for schools.” Another non-negotiable is keeping training centralized. Although teacher leaders can receive training at one of the 19 partner universities, there are only a few places where trainers of teacher leaders can get certified. That allows Reading Recovery to maintain some quality control and fidelity over teacher leader training. “I’ve always been impressed with the fidelity of Reading Recovery instruction,” said D’Aogstino. “I’ve seen Reading Recovery lessons in Zanesville, Ohio and Dublin, Ireland. The framework is the same, but each lesson is different in terms of how the teacher interacts with the student to scaffold literacy learning.

Combine Historical Expertise with Fresh Perspective
D’Agostino is quick to note that one of Reading Recovery’s strengths and challenges is the longevity of its founders and senior leadership. Many of the original developers of the intervention are still in leadership positions. This allows for a historical perspective and continuity of purpose that are rare in education these days. It can also hinder innovation. That is why the organization also tries to find leadership positions for newer faculty and teachers with recent teaching and administrative experience who can bring fresh ideas and a willingness to push for some of the new adjustments to the model that schools are requesting.

Adapt, Adjust, and Meet Schools Where They Are
D’Agostino emphasizes that Reading Recovery’s current success and long history is no reason to sit back and relax. “We have survived a lot of changes over the years. We’ve grown, we’ve shrunk, we’ve survived major threats to our program from other national initiatives. Right now with our i3 grant, we are in a great position. We are going to reach our goal of training 3,700 teachers and producing good effects. But I don’t know that that will position us well for the future. In fact, I won’t be happy if we just reach our goals.” Sustaining an effective intervention and bringing it to more schools and students around the country means innovating, moving, pushing to the next level…and spreading the word. “Schools don’t necessarily hear about government funded initiatives that achieve high evidence standards according to the What Works Clearinghouse,” muses D’Agostino. “They hear from hundreds of vendors each year citing their effectiveness, so how do we distinguish ourselves? We can’t just assume success in our i3 grant will lead to sustainability. Sustainability is all about results. For example, we know that the outcomes are remarkable – most of the lowest-achieving first graders accelerate with Reading Recovery and reach the average of their cohort – but we also know from our annual evaluation that there’s a great deal of variation across schools and teachers. So right now we want to know, what do effective Reading Recovery teachers do and how is that different from less effective Reading Recovery teachers? Knowing more about that black box of teaching will help the intervention overall. And understanding how to foster local ownership will give the intervention its real staying power.”

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Many Programs Meet New Evidence Standards

One of the most common objections to evidence-based reform is that there are too few programs with strong evidence of effectiveness to start encouraging schools to use proven programs. The concern is that it looks bad if a policy of “use what works” leads educators to look for proven programs, only to find that there are very few such programs in a given area, or that there are none at all.

The lack of proven programs is indeed a problem in some areas, such as science and writing, but it is not a problem in others, such as reading and math. There is no reason to hold back on encouraging evidence where it exists.

The U.S. Department of Education has proposed changes to its EDGAR regulations to define “strong” and “moderate” levels of evidence supporting educational programs. These standards use information from the What Works Clearinghouse (WWC), and are very similar to those used in the federal Investing in Innovation (i3) program to designate programs eligible for “scale-up” or “validation” grants, respectively.

As an exercise, my colleagues and I checked to see how many elementary reading programs currently exist that qualify as “strong” or “moderate” according to the new EDGAR standards. This necessitated excluding WWC-approved programs that are not actively disseminated and those that would not meet current WWC standards (2.1 or 3.0), and adding programs not yet reviewed by WWC but that appear likely to meet its standards.

Here’s a breakdown of what we found.

Beginning Reading (K-1)
Total programs                              26
School/classroom programs      16
Small-group tutoring                   4
1-1 tutoring                                   6

Upper Elementary Reading (2-6)
Total programs                               17
School/classroom programs       12
Small-group tutoring                    4
1-1 tutoring                                     1

The total number of unique programs is 35 (many of the programs covered both beginning and upper-elementary reading). Of these, only four met the EDGAR “strong” criterion, but the “moderate” category, which requires a single rigorous study with positive impacts, had 31 programs.

We’ll soon be looking at secondary reading and elementary and secondary math, but the pattern is clear. While few programs will meet the highest EDGAR standard, many will meet the “moderate” standard.

Here’s why this matters. The EDGAR definitions can be referenced in any competitive request for proposals to encourage and/or incentivize the use of proven programs, perhaps offering two competitive preference points for proposals to implement programs meeting the “moderate” standard and three points for proposals to adopt programs meeting the “strong” standard.

Since there are many programs to choose from, educators will not feel constrained by this process. In fact, many may be happy to learn about the many offerings available, and to obtain objective information on their effectiveness. If none of the programs fit their needs, they can choose something unevaluated and forgo the extra points, but even then, they will have considered evidence as a basis for their decisions. And that would be a huge step forward.

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Lessons from Innovators: Children’s Learning Initiative

Children’s Literacy Initiative

The process of moving an educational innovation from a good idea to widespread effective implementation is far from straightforward, and no one has a magic formula for doing it. The W. T. Grant and Spencer Foundations, with help from the Forum for Youth Investment, have created a community composed of grantees in the federal Investing in Innovation (i3) program to share ideas and best practices. Our Success for All program participates in this community. In this space, I, in partnership with the two foundations, will highlight observations from the experiences of i3 grantees other than our own, in an attempt to share the thinking and experience of colleagues out on the front lines of evidence-based reform.

Today’s post focuses on the Children’s Literacy Initiative (CLI). It is based on conversations between the Forum and CLI’s Executive Director, Kelly Hunter, on what it takes to maintain fidelity to a complex model in light of constant change in urban school districts. A summary of her comments is as follows.

Plan for change and stick to your core. School systems are in constant flux and developers must be prepared for instability. The Children’s Literacy Initiative (CLI) tries to do that by using training, coaching and other supports to promote quality teaching to ensure that students in low performing, urban districts are proficient readers, writers and thinkers by the end of third grade. They are currently attempting to scale their effort to four new districts, Camden, Chicago, Newark and Philadelphia. This is easier said than done. Such districts experience frequent teacher and administrator turnover, school closures and mergers, and charter formation. Hunter suggests that if you want to implement with fidelity you first have to take a long and hard look at your model, make decisions about what is core, and then message those core ingredients in a way that respects where schools are coming from. Hunter notes, “We realized that we were struggling with messaging our change model. Even though research shows quality teaching is the number one school factor, funders and others were focused on other reforms that are sexy today. We didn’t want to focus on being negative or bad mouthing other reforms. We just knew we had to be strong in our position, share the research, and stay clear about our message and core ingredients.”

Identify the right champions. Kelly and her partners at CLI have learned that regional superintendents are a critical ingredient for sustained change. These area leaders have considerable influence over principals. “At the beginning,” Hunter notes, “we would get central office and schools to sign off, but not the regional superintendents. Then we would be off and running but all of a sudden the regionals were messaging something different than what we were doing.” When regional leaders began to understand the importance of fidelity and appreciate the core ingredients, they were then able to share their enthusiasm with principals or set standards to reinforce values and practices consistent with the model.

Partner to multiply resources and minimize obstacles. As they push towards scale, leaders at CLI have also learned the importance of cultivating new and varied partnerships. In addition to district staff, especially important partners include local funders and other program providers. Local funders are essential from a sustainability standpoint. It is also critical to partner with other entities that provide related services or technical assistance within a building or district – even when they involve a different subject matter or grade. These partnerships can allow for more comprehensive and coherent supports across disciplines and grade levels and minimize confusion among and competing demands on district staff. “It’s about enhancing what we are doing, not changing it,” comments Hunter. For example, in one i3 school in West Philadelphia, Drexel University was providing coaching services in math while CLI was providing literacy coaching. By working together, they were able to make coaching across these topics more consistent and communication more streamlined.

Scale back to scale up. Implementing innovative practices is complicated and labor intensive. Regional knowledge is necessary to help align external needs and resources with your own organizations’ demands and capacities. Networking locally is a great way to learn about a school, community or district, and to identify key stakeholders, funders, and advocates. But building this knowledge and these relationships takes staff, time, and energy. To address this challenge, CLI revisited their initial plan and decided to concentrate energy and resources on implementing the model deeply in four cities rather than spread themselves thinly across ten. According to Hunter, “we knew that in some communities, we didn’t have enough local influence, networking and outreach to raise the dollars and implement the model with fidelity. We were chasing dollars and our model was being compromised. Ultimately that compromises student achievement.” Instead, she says, “over time we hope to build our presence in and around our four hubs and eventually serve as a model for other communities as they scale to surrounding schools and districts.”

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin

Taking a Charter Network to Scale: IDEA Public Schools

The process of moving an educational innovation from a good idea to widespread effective implementation is far from straightforward, and no one has a magic formula for doing it. The W. T. Grant and Spencer Foundations, with help from the Forum for Youth Investment, have created a community composed of grantees in the federal Investing in Innovation (i3) program to share ideas and best practices. Our Success for All program participates in this community. In this space, I, in partnership with the Forum for Youth Investment, will highlight observations from the experiences of i3 grantees other than our own, in an attempt to share the thinking and experience of colleagues out on the front lines of evidence-based reform.

What are charter management organizations learning about scaling up their strategies?

Today’s post is based on a conversation between the Forum for Youth Investment and IDEA Public Schools’ Chief Human Assets Officer Audrey Hooks. She describes IDEA’s attempt to disseminate its guiding principles and knowledge, not just curriculum and materials.

IDEA Public Schools is a growing network of K-12 public charter schools serving more than 13,000 students in 28 schools throughout the Rio Grande Valley, Austin and San Antonio. In the Rio Grande Valley region of Texas, IDEA leads 12 primary and 12 secondary public charter schools, alongside 20 Independent School Districts. For their i3 project, they formed a partnership with the Pharr, San Juan and Alamo Independent School District (PSJA). Together, the goal is to reform the human capital systems in both districts (IDEA and PSJA), from onboarding principals and teachers to developing them as leaders and evaluating them. “What is unique about our i3 project is that both districts have very similar needs and we are meeting those needs through solutions that include similar components,” reports Hooks. “But as we have gone through the grant and gotten more sophisticated in our thinking, each district has customized and personalized our offerings based on the unique cultures of each organization.” The two districts had explored working together for years, and the i3 grant gave them the opportunity. In particular, it created the opportunity for each district to test out strategies to build capacity internally rather than bring in outside firms or consultants, and to build the necessary district infrastructure to support those changes.

Over the course of this work, one of the key lessons that IDEA and PSJA have learned about scaling is that ideas and best practices are more important than specific curriculum and training modules. According to Hooks:

The first thing we had to do was come together as a partnership to decide what we actually believed was most important to scale. Was it most important that adopters use the exact modules of our curriculum with fidelity? No, we decided. Instead we have a set of human capital principles that we think are the most important to get out there.

Those principles, gleaned from two years of doing this work, include:

1. Clearly define teacher and leader excellence. Use the definitions in every stage of the human capital pipeline – to hire the right people, onboard and train them, evaluate their performance, and make promotion decisions.
It sounds simple says Hooks, but in reality, it is not a practice most districts follow. Hooks:

More often than not, the hiring team doesn’t talk to the coaching team, doesn’t talk to the people who do performance evaluation. The point is not that all districts need to use the same framework, just that you need a good, research-based framework that your organization is committed to using in a variety of settings – hiring, coaching and evaluation.

2. Success at all stages of human capital work can and should be measured in part by student achievement.
Hook says:

For example, one critical measure of teacher success comes a year after teachers have been hired and involves looking at their impact on student achievement – that is one of the factors that we should use to determine if our hiring team and principals are making strong hiring choices.

3. Human capital development should be integrated into the everyday business of the district. In the IDEA/PSJA partnership, effective development is not a slate of programs but rather a philosophy of operation. “We actually made a decision not to create a separate i3 work group to create materials and an institute on professional development and training,” says Hooks.

Instead, we decided that it had to be embedded to ensure its sustainability. I don’t have a shadow of a doubt that if I were to leave, that five years from now the IDEA schools would still have our Principal in Residence Program, our Leadership Institute and all these great human capital developments that i3 kick started for us. They aren’t going anywhere. The reason they are so embedded is because we built them in house.

Embedding principles in a district is an important first step, but when the practices and strategies that enact these principles are allowed to develop internally they become part of the operating procedure for any new school that IDEA opens. That is significant given IDEA has a goal to open 56 more schools by the end of 2018. Hooks believes this learning in their i3 work has informed and enabled their scaling. Although she was always confident they could raise funds to build new buildings and get schools up and running, she was less certain about finding and training the talent. Now with the new Principal in Residence program and the Teacher Institute in place, she feels confident they can find the people to fill all those new schools.

One of the most critical lessons of this grant is that the partners are able to embed new human capital principles in two very different systems – one rapidly growing, relatively new organization and one established district serving twice as many students. When it comes to embedding new principles and practices in a well-established district like PSJA, change can be harder. Hooks acknowledges that in a district that has been around for more than 100 years, current practices may be more established and there are often more staff members to invest in any changes to fundamental practices. The PSJA staff who are implementing human capital practices have had great success thus far, which is a promising sign that the principles described above are still valid, regardless of setting, and that even in a district where change may be more difficult, having these guiding principles in place will make change easier than simply trying to import new training or curricula without that foundation.

Follow Robert E. Slavin on Twitter: www.twitter.com/RobertSlavin