I’m delighted to see that the idea of large-scale tutoring to combat Covid-19 losses has gotten so important in the policy world that it is attracting scoffers and doubters. Michael Goldstein and Bowen Paulle (2020) published five brief commentaries recently in The Gadfly, warning about how tutoring could fail, both questioning the underlying research on tutoring outcomes (maybe just publication bias?) and noting the difficulties of rapid scale up. They also quote without citation a comment by Andy Rotherham, who quite correctly notes past disasters when government has tried and failed to scale up promising strategies: “Ed tech, class size reduction, teacher evaluations, some reading initiatives, and charter schools.” To these, I would add many others, but perhaps most importantly Supplementary Educational Services (SES), a massive attempt to implement all sorts of after school and summer school programs in high-poverty, low-achieving schools, which had near-zero impact, on average.
So if you were feeling complacent that the next hot thing, tutoring, was sure to work, no matter how it’s done, then you have not been paying attention for the past 30 years.
But rather than argue with these observations, I’d like to explain that the plan I’ve proposed, which you will find here, is fundamentally different from any of these past efforts, and if implemented as designed, with adequate funding, is highly likely to work at scale.
1. Unlike all of the initiatives Rotherham dismisses, unlike SES, unlike just about everything ever used at scale in educational policy, the evidence base for certain specific, well-evaluated programs is solid. And in our plan, only the proven programs would be scaled.
A little known but crucial fact: Not all tutoring programs work. The details matter. Our recent reviews of research on programs for struggling readers (Neitzel et al., in press) and math (Pellegrini et al., in press) identify individual tutoring programs that do and do not work, as well as types of tutoring that work well and those that do not.
Our scale-up plan would begin with programs that already have solid evidence of effectiveness, but it would also provide funding and third-party, rigorous evaluations of scaled-up programs without sufficient evidence, as well as new programs, designed to add additional options for schools. New and insufficiently evaluated programs would be piloted and implemented for evaluation, but they would not be scaled up unless they have solid evidence of effectiveness in randomized evaluations.
If possible, in fact, we would hope to re-evaluate even the most successful evaluated programs, to make sure they work.
If we stick to repeatedly-proven programs, rigorously evaluated in large randomized experiments, then who cares whether other programs have failed in the past? We will know that the programs being used at scale do work. Also, all this research would add greatly to knowledge about effective and ineffective program components and applications to particular groups of students, so over time, we’d expect the individual programs, and the field as a whole, to gain in the ability to provide proven tutoring approaches at scale.
2. Scale-up of proven programs can work if we take it seriously. It is true that scale-up has many pitfalls, but I would argue that when scale-up does not occur it is for one of two reasons. First, the programs being scaled were not adequately proven in the first place. Second, the funding provided for scale-up was not sufficient to allow the program developers to scale up under the conditions they know full well are necessary. As examples of the latter, programs that provided well-trained and experienced trainers in their initial studies are often forced by insufficient funding to use trainer-of-trainers models for greatly diminished amounts of training in scale-up. As a result, the programs that worked at small scale failed in large-scale replication. This happens all the time, and this is what makes policy experts conclude that nothing works at scale.
However, the lesson they should have learned instead is just that programs proven to work at small scale can succeed if the key factors that made them work at small scale are implemented with fidelity at large scale. If anything less is done in scale-up, you’re taking big risks.
If well-trained trainers are essential, then it is critical to insist on well-trained trainers. If a certain amount or quality of training is essential, it is critical to insist on it, and make sure it happens in every school using a given program. And so on. There is no reason to skimp on the proven recipe.
But aren’t all these trainers and training days and other elements unsustainable? This is the wrong question. The right one is, how can we make tutoring as effective as possible, to justify its cost?
Tutoring is expensive, but most of the cost is in the salaries of the tutors themselves. As an analogy, consider horse racing. Horse owners pay millions for horses with great potential. Having done so, do you think they skimp on trainers or training? Of course not. In the same way, a hundred teaching assistants tutors cost roughly $4 million per year in salaries and benefits alone. Let’s say top-quality training for this group costs $500,000 per year, while crummy training costs $50,000. If these figures are in the ballpark, would it be wise to spend $4,500,000 on a terrific tutoring program, or $4,050,000 on a crummy one?
Successful scale-up takes place all the time in business. How does Starbucks make sure your experience in every single store is excellent? Simple. They have well-researched, well specified, obsessively monitored standards and quality metrics for every part of their operation. Scale-up in education can work just the same way, and in comparison to the costs of front-line personnel, the costs of great are trivially greater than the cost of crummy.
3. Ongoing research will, in our proposal, formatively evaluate the entire tutoring effort over time, and development and evaluation will continually add new proven programs.
Ordinarily, big federal education programs start with all kinds of rules and regulations and funding schemes, and these are announced with a lot of hoopla and local and national meetings to explain the new programs to local educators and leaders. Some sort of monitoring and compliance mechanism is put in place, but otherwise the program steams ahead. Several years later, some big research firm gets a huge contract to evaluate the program. On average, the result is almost always disappointing. Then there’s a political fight about just how disappointing the results are, and life goes on.
The program we have proposed is completely different. First, as noted earlier, the individual programs that are operating at large scale will all be proven effective to begin with, and may be evaluated and proven effective again, using the same methods as those used to validate new programs. Second, new proven programs would be identified and scaled up all the time. Third, numerous studies combining observations, correlational studies, and mini-experiments would be evaluating program variations and impacts with different populations and circumstances, adding knowledge of what is happening at the chalkface and of how and why outcomes vary. This explanatory research would not be designed to decide which programs work and which do not (that would be done in the big randomized studies), but to learn from practice how to improve outcomes for each type of school and application. The idea is to get smarter over time about how to make tutoring as effective as it can be, so when the huge summative evaluation takes place, there will be no surprises. We would already know what is working, and how, and why.
Our National Tutoring Corps proposal is not a big research project, or a jobs program for researchers. The overwhelming focus is on providing struggling students the best tutoring we know how to provide. But using a small proportion of the total allocation would enable us to find out what works, rapidly enough to inform practice. If this were all to happen, we would know more and be able to do more every year, serving more and more struggling students with better and better programs.
So rather than spending a lot of taxpayer money and hoping for the best, we’d make scale-up successful by using evidence at the beginning, middle, and end of the process, to make sure that this time, we really know what we are doing. We would make sure that effective programs remain successful at scale, rather than merely hoping they will.
Goldstein, M., & Paulle, B. (2020, Dec. 8) Vaccine-making’s lessons for high-dosage tutoring, Part 1. The Gadfly.
Goldstein, M., & Paulle, B. (2020, Dec. 11). Vaccine-making’s lessons for high-dosage tutoring, Part IV. The Gadfly.
Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.
Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (in press). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open.
Original photo by Catherine Carusso, Presidio of Monterey Public Affairs
This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.
Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to email@example.com.