Avoiding the Errors of Supplemental Educational Services (SES)

“The definition of insanity is doing the same thing over and over again, and expecting different results.” –Albert Einstein

Last Friday, the U.S. Senate and House of Representatives passed a $1.9 trillion recovery bill. Within it is the Learning Recovery Act (LRA). Both the overall bill and the Learning Recovery Act are timely and wonderful. In particular, the LRA emphasizes the importance of using research-based tutoring to help students who are struggling in reading or math. The linking of evidence to large-scale federal education funding began with the 2015 ESSA definition of proven educational programs, and the LRA would greatly increase the importance of evidence-based practices.

But if you sensed a “however” coming, you were right. The “however” is that the LRA requires investments of substantial funding in “school extension programs,” such as “summer school, extended day, or extended school year programs” for vulnerable students.

This is where the Einstein quote comes in. “School extension programs” sound a lot like Supplemental Educational Services (SES), part of No Child Left Behind that offered parents and children an array of services that had to be provided after school or in summer school.

The problem is, SES was a disaster. A meta-analysis of 28 studies of SES by Chappell et al. (2011) found a mean effect size of +0.04 for math and +0.02 for reading. A sophisticated study by Deke et al. (2014) found an effect size of +0.05 for math and -0.03 for reading. These effect sizes are just different flavors of zero. Zero was the outcome whichever way you looked at the evidence, with one awful exception: The lowest achievers, and special education students, actually performed significantly less well in the Deke et al. (2014) study if they were in SES than if they qualified but did not sign up. The effect sizes for these students were around -0.20 for reading and math. Heinrich et al. (2009) also reported that the lowest achievers were least likely to sign up for SES, and least likely to attend regularly if they did. All three major studies found that outcomes did not vary much depending on which type of provider or program they received. Considering that the per-pupil cost was estimated at $1,725 in 2021 dollars, these outcomes are distressing, but more important is the fact that despite the federal government’s willingness to spend quite a lot on them, millions of struggling students in desperate need of effective assistance did not benefit.

Why did SES fail? I have two major explanations. Heinrich et al. (2009), who added questionnaires and observations to find out what was going on, discovered that at least in Milwaukee, attendance in SES after-school programs was appalling (as I reported in my previous blog). In the final year studied, only 16% of eligible students were attending (less than half signed up at all, and of those, average attendance in the remedial program was only 34%). Worse, the students in greatest need were least likely to attend.

From their data and other studies they cite, Heinrich et al. (2010) paint a picture of students doing boring, repetitive worksheets unrelated to what they were doing in their school-day classes. Students were incentivized to sign up for SES services with incentives, such as iPods, gift cards, or movie passes. Students often attended just enough to get their incentives, but then stopped coming. In 2006-2007, a new policy limited incentives to educationally-related items, such as books and museum trips, and attendance dropped further. Restricting SES services to after-school and summertime, when attendance is not mandated and far from universal, means that students who did attend were in school while their friends were out playing. This is hardly a way to engage students’ motivation to attend or to exert effort. Low-achieving students see after school and summertime as their free time, which they are unlikely to give up willingly.

Beyond the problems of attendance and motivation in extended time, there was another key problem with SES. This was that none of the hundreds of programs offered to students in SES were proven to be effective beforehand (or ever) in rigorous evaluations. And there was no mechanism to find out which of them were working well, until very late in the program’s history. As a result, neither schools nor parents had any particular basis for selecting programs according to their likely impact. Program providers probably did their best, but there was no pressure on them to make certain that students benefited from SES services.

As I noted in my previous blog, evaluations of SES do not provide the only evidence that after school and summer school programs rarely work for struggling students. Reviews of summer school programs by Xie et al. (in press) and of after school programs (Dynarski et al., 2002; Kidron & Lindsay, 2014) have found similar outcomes, always for the same reasons: poor attendance and poor motivation of students in school when they would otherwise have free time.

Designing an Effective System of Services for Struggling Students

There are two policies that are needed to provide a system of services capable of substantially improving student achievement. One is to provide services during the ordinary school day and year, not in after school or summer school. The second is to strongly emphasize the use of programs proven to be highly effective in rigorous research.

Educational services provided during the school day are far more likely to be effective than those provided after school or in the summer. During the day, everyone expects students to be in school, including the students themselves. There are attendance problems during the regular school day, of course, especially in secondary schools, but these problems are much smaller than those in non-school time, and perhaps if students are receiving effective, personalized services in school and therefore succeeding, they might attend more regularly. Further, services during the school day are far easier to integrate with other educational services. Principals, for example, are far more likely to observe tutoring or other services if they take place during the day, and to take ownership for ensuring their effectiveness. School day services also entail far fewer non-educational costs, as they do not require changing bus schedules, cleaning and securing schools more hours each day, and so on.

The problem with in-school services is that they can disrupt the basic schedule. However, this need not be a problem. Schools could designate service periods for each grade level spread over the school day, so that tutors or other service providers can be continuously busy all day. Students should not be taken out of reading or math classes, but there is a strong argument that a student who is far below grade level in reading or math needs a reading or math tutor using a proven tutoring model far more than other classes, at least for a semester (the usual length of a tutoring sequence).

If schools are deeply reluctant to interrupt any of the ordinary curriculum, then they might extend their day to offer art, music, or other subjects during the after-school session. These popular subjects might attract students without incentives, especially if students have a choice of which to attend. This could create space for tutoring or other services during the school day. A schedule like this is virtually universal in Germany, which provides all sports, art, music, theater, and other activities after school, so all in-school time is available for academic instruction.

Use of proven programs makes sense throughout the school day. Tutoring should be the main focus of the Learning Recovery Act, because in this time of emergency need to help students recover from Covid school closures, nothing less will do. But in the longer term, adoption of proven classroom programs in reading, math, science, writing, and other subjects should provide a means of helping students succeed in all parts of the curriculum (see www.evidenceforessa.org).

In summer, 2021, there may be a particularly strong rationale for summer school, assuming schools are otherwise able to open.  The evidence is clear that doing ordinary instruction during the summer will not make much of a difference, but summer could be helpful if it is used as an opportunity to provide as many struggling students as possible in-person, one-to-one or one-to-small group tutoring in reading or math.  In the summer, students might receive tutoring more than once a day, every day for as long as six weeks.  This could make a particularly big difference for students who basically missed in-person kindergarten, first, or second grade, a crucial time for learning to read.  Tutoring is especially effective in those grades in reading, because phonics is relatively easy for tutors to teach.  Also, there is a large number of effective tutoring programs for grades K-2.  Early reading failure is very important to prevent, and can be prevented with tutoring, so the summer months may get be just the right time to help these students get a leg up on reading.

The Learning Recovery Act can make life-changing differences for millions of children in serious difficulties. If the LRA changes its emphasis to the implementation of proven tutoring programs during ordinary school times, it is likely to accomplish its mission.

SES served a useful purpose in showing us what not to do. Let’s take advantage of these expensive lessons and avoid repeating the same errors. Einstein would be so proud if we heed his advice.

Correction

My recent blog, “Avoiding the Errors of Supplemental Educational Services,” started with a summary of the progress of the Learning Recovery Act.  It was brought to my attention that my summary was not correct.  In fact, the Learning Recovery Act has been introduced in Congress, but is not part of the current reconciliation proposal moving through Congress and has not become law. The Congressional action cited in my last blog was referring to a non-binding budget resolution, the recent passage of which facilitated the creation of the $1.9 trillion reconciliation bill that is currently moving through Congress. Finally, while there is expected to be some amount of funding within that current reconciliation bill to address the issues discussed within my blog, reconciliation rules will prevent the Learning Recovery Act from being included in the current legislation as introduced.

References

Chappell, S., Nunnery, J., Pribesh, S., & Hager, J. (2011). A meta-analysis of Supplemental Education Services (SES) provider effects on student achievement. Journal of Education for Students Placed at Risk, 16 (1), 1-23.

Deke, J., Gill, B. Dragoset, L., & Bogen, K. (2014). Effectiveness of supplemental educational services. Journal of Research in Educational Effectiveness, 7, 137-165.

Dynarski, M. et al. (2003). When schools stay open late: The national evaluation of the 21st Century Community Learning Centers Programs (First year findings). Washington, DC: U.S. Department of Education.

Heinrich, C. J., Meyer, R., H., & Whitten, G. W. (2010). Supplemental Education Services under No Child Left Behind: Who signs up and what do they gain? Education Evaluation and Policy Analysis, 32, 273-298.

Kidron, Y., & Lindsay, J. (2014). The effects of increased learning time on student academic and nonacademic outcomes: Findings from a meta‑analytic review (REL 2014-015). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Appalachia.

Xie, C., Neitzel, A., Cheung, A., & Slavin, R. E. (2020). The effects of summer programs on K-12 students’ reading and mathematics achievement: A meta-analysis. Manuscript submitted for publication.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Why Isn’t Achievement Whirled Enough by Time? (Why Summer School, After School, and Extended Day Do Not Work Very Well)

“Had we but world enough and time…” wrote Andrew Marvell, an English poet in the late 1600s (He also had another job, highly relevant to this blog, which I will reveal at the end. No peeking!).

Marvell’s poem was about making the most of the limited time we have on Earth. In education, we understand this sentiment. Time is a key resource for teaching, not to be wasted under any circumstances.

In fact, educators have long tried to improve students’ achievement by increasing their time in school. In particular, struggling students have been invited or required to attend after school or summer school classes.

 Many school reformers have advocated expanded opportunities for extra-time instruction, as solutions to the learning losses due to Covid-19 school closures. In fact, the current draft of the Democrats’ relief bill emphasizes investments in after school and summer school programs to help these students catch up. Yet these very expensive efforts have not had much impact on reading or math learning in studies done before Covid, and are not likely to have much impact now (see my previous blog on this topic).

How can this be? Summer school, for example, offers several weeks of extra teaching in small classes tailored to the learning levels of the students. Yet summer school for reading has been completely ineffective, except for tutoring phonics in K-1. Math summer school studies involving disadvantaged and low-achieving students also found effect sizes near zero (Xie et al., 2020).

With respect to after-school programs, a review by Kidron & Lindsay (2014) found average effect sizes near zero.

A study in Milwaukee by Heinrich et al. (2009) of after school programs provided under Supplemental Education Services (SES) funding found effect sizes near zero for middle and high school students. The authors investigated the reasons for these disappointing findings. Among eligible students, 57% registered in the first year, dropping to 48% by the fourth year. Yet the bigger problem was attendance. As a percent of registered students, attendance dropped from 90% in the first year to 34% in the fourth, meaning that among all eligible students, only 16% attended in the final year. This abysmal attendance rate should not be surprising in light of the observation in the study that most of the after-school time was spent on worksheets, with little or no instruction. The Heinrich et al. (2009) paper contained the following depressing sentence:

“…one might also speculate that parents and students are, in fact, choosing rationally in not registering for or attending SES.” (p. 296).

Reviews of research on the impacts of all approaches to SES find average effects that are appalling (e.g., Chappell et al., 2011). I will write more about SES as a cautionary tale in a later blog, but one conclusion important to the blog is clear: Providing educational programs to struggling students after school or in the summer is unlikely to improve student achievement.

The reasons that additional time after school or in the summer does not enhance achievement is obvious, if you’ve ever been a teacher or a student. No one wants to be sitting in school while their friends are out playing. Extra time approaches that simply provide more of the same are probably boring, tedious, and soul-sapping. Imagine kids watching the clock, quietly cheering for every click. It is no wonder that students fail to register or fail to attend after school or summer school sessions, and learn little in them if they do.

The poet Andrew Marvell had it right. What is important is to make effective use of the time we have, rather than adding time. And his profession, other than being a poet? He was a tutor.

References

Chappell, S., Nunnery, J., Pribesh, S., & Hager, J. (2011). A meta-analysis of Supplemental Education Services (SES) provider effects on student achievement. Journal of Education for Students Placed at Risk, 16 (1), 1-23.

Heinrich, C. J., Meyer, R., H., & Whitten, G. W. (2010). Supplemental Education Services under No Child Left Behind: Who signs up and what do they gain? Education Evaluation and Policy Analysis, 32, 273-298.

Kidron, Y., & Lindsay, J. (2014). The effects of increased learning time on student academic and nonacademic outcomes: Findings from a meta‑analytic review (REL 2014-015). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Appalachia.

Xie, C., Neitzel, A., Cheung, A., & Slavin, R. E. (2020). The effects of summer programs on K-12 students’ reading and mathematics achievement: A meta-analysis. Manuscript submitted for publication.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Highlight Tutoring Among Post-Covid Solutions

I recently saw a summary of the education section of the giant, $1.9 trillion proposed relief bill now before Congress. Like all educators, I was delighted to see the plan to provide $130 billion to help schools re-open safely, and to fund efforts to remedy the learning losses so many students have experienced due to school closures.

However, I was disappointed to see that the draft bill suggests that educators can use whatever approaches they like, and it specifically mentioned summer school and after school programs as examples.

Clearly, the drafters of this legislation have not been reading my blogs! On September 10th I wrote a blog reviewing research on summer school and after school programs as well as tutoring and other approaches. More recently, I’ve been doing further research on these recommendations for schools to help struggling students. I put my latest findings into two tables, one for reading and one for math. These appear below.

As you can see, not all supplemental interventions for struggling students are created equal. Proven tutoring models (ones that were successfully evaluated in rigorous experiments) are far more effective than other strategies. The additional successful strategy is our own Success for All whole-school reform approach (Cheung et al., in press), but Success for All incorporates tutoring as a major component.

However, it is important to note that not all tutoring programs are proven to be effective. Programs that do not provide tutors with structured materials and guidance with extensive professional development and in-class coaching, or use unpaid tutors whose attendance may be sporadic, have not produced the remarkable outcomes typical of other tutoring programs.

Tutoring

As Tables 1 and 2 show, proven tutoring programs produce substantial positive effects on reading and math achievement, and nothing else comes close (see Gersten et al., 2020; Neitzel et al., in press; Nickow et al. 2020; Pellegrini et al., 2021; Wanzek et al., 2016).

Tables 1 and 2 only include results from programs that use teaching assistants, AmeriCorps members (who receive stipends), and unpaid volunteer tutors. I did not include programs that use teachers as tutors, because in the current post-Covid crisis, there is a teacher shortage, so it is unlikely that many certified teachers will serve as tutors. Also, research in both reading and math finds little difference in student outcomes between teachers and teaching assistants or AmeriCorps members, so there is little necessity to hire certified teachers as tutors. Unpaid tutors have not been as effective as paid tutors.

Both one-to-one and one-to-small group tutoring by teaching assistants can be effective. One-to-one is somewhat more effective in reading, on average (Neitzel et al., in press), but in math there is no difference in outcomes between one-to-one and one-to-small group (Pellegrini et al., 2021).

Success for All

Success for All is a whole-school reform approach. A recent review of 17 rigorous studies of Success for All found an effect size of +0.51 for students in the lowest 25% of their grades (Cheung et al., in press). However, such students typically receive one-to-one or one-to-small group tutoring for some time period during grades 1 to 3. Success for All also provides all teachers professional development and materials focusing on phonics in grades K-2 and comprehension in grades 2-6, as well as cooperative learning in all grades, parent support, social-emotional learning instruction, and many other elements. So Success for All is not just a tutoring approach, but tutoring plays a central role for the lowest-achieving students.

Summer School

A recent review of research on summer school by Xie et al. (2020) found few positive effects on reading or math achievement. In reading, there were two major exceptions, but in both cases the students were in grades K to 1, and the instruction involved one-to-small group tutoring in phonics. In math, none of the summer school studies involving low-achieving students found positive effects.

After School

A review of research on after-school instruction in reading and math found near-zero impacts in both subjects (Kidron & Lindsay, 2014).

Extended Day

A remarkable study of extended day instruction was carried out by Figlio et al. (2018). Schools were randomly assigned to receive one hour of additional reading instruction for a year, or to serve as a control group. The outcomes were positive but quite modest (ES=+0.09) considering the considerable expense.

Technology

Studies of computer-assisted instruction and other digital approaches have found minimal impacts for struggling students (Neitzel et al., in press; Pellegrini et al., 2021).

Policy Consequences

The evidence is clear that any effort intended to improve the achievement of students struggling in reading or mathematics should make extensive use of proven tutoring programs. Students who have fallen far behind in reading or math need programs known to make a great deal of difference in a modest time period, so struggling students can move toward grade level, where they can profit from ordinary teaching. In our current crisis, it is essential that we follow the evidence to give struggling students the best possible chance of success.

References

Cheung, A., Xie, C., Zhang, T., Neitzel, A., & Slavin, R. E. (in press). Success for All: A quantitative synthesis of evaluations. Journal of Research on Educational Effectiveness.

Figlio, D., Holden, K., & Ozek, U. (2018). Do students benefit from longer school days? Regression discontinuity evidence from Florida’s additional hour of literacy instruction. Economics of Education Review, 67, 171-183.

Gersten, R., Haymond, K., Newman-Gonchar, R., Dimino, J., & Jayanthi, M. (2020). Meta-analysis of the impact of reading interventions for students in the primary grades. Journal of Research on Educational Effectiveness, 13(2), 401–427.

Kidron, Y., & Lindsay, J. (2014). The effects of increased learning time on student academic and nonacademic outcomes: Findings from a meta‑analytic review (REL 2014-015). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Appalachia.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (2021). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open, 7 (1), 1-29.

Wanzek, J., Vaughn, S., Scammacca, N., Gatlin, B., Walker, M. A., & Capin, P. (2016). Meta-analyses of the effects of tier 2 type reading interventions in grades K-3. Educational Psychology Review, 28(3), 551–576. doi:10.1007/s10648-015-9321-7

Xie, C., Neitzel, A., Cheung, A., & Slavin, R. E. (2020). The effects of summer programs on K-12 students’ reading and mathematics achievement: A meta-analysis. Manuscript submitted for publication.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Building Back Better

Yesterday, President Joe Biden took his oath of office. He is taking office at one of the lowest points in all of American history. Every American, whatever their political beliefs, should be wishing him well, because his success is essential for the recovery of our nation.

In education, most schools remain closed or partially open, and students are struggling with remote learning. My oldest granddaughter is in kindergarten. Every school day, she receives instruction from a teacher she has never met. She has never seen the inside of “her school.” She is lucky, of course, because she has educators as grandparents (us), but it is easy to imagine the millions of kindergartners who do not even have access to computers, or do not have help in learning to read and learning mathematics. These children will enter first grade with very little of the background they need, in language and school skills as well as in content.

Of course, the problem is not just kindergarten. All students have missed a lot of school, and they will vary widely in their experiences during that time. Think of second graders who essentially missed first grade. Students who missed the year when they are taught biology. Students who missed the fundamentals of creative writing. Students who should be in Algebra 2, except that they missed Algebra 1.

Hopefully, providing vaccines as quickly as possible to school staffs will enable most schools to open this spring. But we have a long, long way to go to get back to normal, especially with disadvantaged students. We cannot just ask students on their first day back to open their math books to the page they were on in March, 2020, when school closed.

Students need to be assessed when they return, and if they are far behind in reading or math, given daily tutoring, one-to-one or one-to-small group. If you follow this blog, you’ve heard me carry on at length about this.

Tutoring services, using tutoring programs proven to be effective, will be of enormous help to students who are far behind grade level (here, here, here). But the recovery from Covid-19 school closures should not be limited to repairing the losses. Instead, I hope the Covid-19 crisis can be an opportunity to reconsider how to rebuild our school system to enhance the school success of all students.

If we are honest with ourselves, we know that schooling in America was ailing long before Covid-19. It wasn’t doing so badly for middle class children, but it was failing disadvantaged students. These very same students have suffered disproportionately from Covid-19. So in the process of bringing these children back into school, let’s not stop with getting back to normal. Let’s figure out how to create schools that use the knowledge we have gained over the past 20 years, and knowledge we can develop in the coming years, to transform learning for our most vulnerable children.

Building Back Better

Obviously, the first thing we have to do this spring is reopen schools and make them as healthy, happy, welcoming, and upbeat as possible. We need to make sure that schools are fully staffed and fully equipped. We do need to “build back” before we can “build back better.” But we cannot stop there. Below, I discuss several things that would greatly transform education for disadvantaged students.

1.  Tutoring

Yes, tutoring is the first thing we have to do to build better. Every child who is significantly below grade level needs daily one-to-small group or one-to-one tutoring, until they reach a pre-established level of performance, depending on grade level, in reading and math.

However, I am not talking about just any tutoring. Not all tutoring works. But there are many programs that have been proven to work, many times. These are the tutoring programs we need to start with as soon as possible, with adequate training resources to ensure student success.

Implementing proven tutoring programs on a massive scale is an excellent “build back” strategy, the most effective and cost-effective strategy we have. However, tutoring should also be the basis for a key “build better” strategy

2.  Establishing success as a birthright and ensuring it using proven programs of all kinds.

We need to establish adequate reading and mathematics achievement as the birthright of every child. We can debate about what that level might be, but we must hold ourselves accountable for the success of every child. And we need to accomplish this not just by using accountability assessments and hoping for the best, but by providing proven programs to all students who need them for as long as they need them.

As I’ve pointed out in many blogs (here, here, here), we now have many programs proven effective in rigorous experiments and known to improve student achievement (see www.evidenceforessa.org). Every child who is performing below level, and every school serving many children below grade level, should have resources and knowledge to adopt proven programs. Teachers and tutors need to be guaranteed sufficient professional development and in-class coaching to enable them to successfully implement proven programs. Years ago, we did not have sufficient proven programs, so policy makers kept coming up with evidence-free policies, which have just not worked as intended. But now, we have many programs ready for widespread dissemination. To build better, we have to use these tools, not return to near universal use of instructional strategies, materials, and technology that have never been successfully evaluated. Instead, we need to use what works, and to facilitate adoption and effective implementation of proven programs.

3.  Invest in development and evaluation of promising programs.

How is it that in a remarkably short time, scientists were able to develop vaccines for Covid-19, vaccines that promise to save millions of lives? Simple. We invested billions in research, development, and evaluations of alternative vaccines. Effective vaccines are very difficult to make, and the great majority failed.  But at this writing, two U.S. vaccines have succeeded, and this is a mighty good start. Now, government is investing massively in rigorous dissemination of these vaccines.

Total spending on all of education research dedicated to creating and evaluating educational innovations is a tiny fraction of what has been spent and will be spent on vaccines. But can you imagine that it is impossible to improve reading, math, science, and other outcomes, with clear goals and serious resources? Of course it could be done. A key element of “building better” could be to substantially scale up use of proven programs we have now, and to invest in new development and evaluation to make today’s best obsolete, replaced by better and better approaches. The research and evaluation of tutoring proves this could happen, and perhaps a successful rollout of tutoring will demonstrate what proven programs can do in education.

4.  Commit to Success

Education goes from fad to fad, mandate to mandate, without making much progress. In order to “build better,” we all need to commit to finding what works, disseminating it broadly, and then finding even better solutions, until all children are succeeding. This must be a long-term commitment, but if we are investing adequately and see that we are improving outcomes each year, then it is clear we can do it.            

With a change of administrations, we are going to hear a lot about hope. Hope is a good start, but it is not a plan. Let’s plan to build back better, and then for the first time in the history of education, make sure our solutions work, for all of our children.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Tutoring Could Change Everything

Starting in the 1990s, futurists and technology fans began to say, “The Internet changes everything.” And eventually, it did. The Internet has certainly changed education, although it is unclear whether these changes have improved educational effectiveness.

Unlike the Internet, tutoring has been around since hunters and gatherers taught their children to hunt and gather. Yet ancient as it is, making one-to-one or small group tutoring widely available in Title I schools could have profound impacts on the most nettlesome problems of education.

            If the National Tutoring Corps proposal I’ve been discussing in recent blogs (here , here, and here) is widely implemented and successful, it could have both obvious and not-so-obvious impacts on many critical aspects of educational policy and practice. In this blog, I’ll discuss these revolutionary and far-reaching impacts.

Direct and Most Likely Impacts

Struggling Students

            Most obviously, if the National Tutoring Corps is successful, it will be because it has had an important positive impact on the achievement of students who are struggling in reading and/or mathematics. At 100,000 tutors, we expect as many as four million low-achieving students in Title I schools will benefit, about 10% of all U.S. students in grades 1-9, but, say, 50% of the students in the lowest 20% of their grades.

Title I

            In a December 20 tweet, former Houston superintendent Terry Grier suggested: “Schools should utilize all or most of their Title I money to implement tutoring programs…to help K-2 students catch up on lost literacy skills.”

            I’d agree, except that I’d include later grades and math as well as reading if there is sufficient funding. The purpose of Title I is to accelerate the achievement of low-achieving, disadvantaged students. If schools were experienced with implementing proven tutoring programs, and knew them from their own experience to be effective and feasible, why would such programs not become the main focus of Title I funding, as Grier suggests?

Special Education

            Students with specific learning disabilities and other “high-incidence” disabilities (about half of all students in special education) are likely to benefit from structured tutoring in reading or math. If we had proven, reliable, replicable tutoring models, with which many schools will have had experience, then schools might be able to greatly reduce the need for special education for students whose only problem is difficulty in learning reading or mathematics. For students already in special education, their special education teachers may adopt proven tutoring methods themselves, and may enable students with specific learning disabilities to succeed in reading and math, and hopefully to exit special education.

Increasing the Effectiveness of Other Tutoring and Supportive Services

            Schools already have various tutoring programs, including volunteer programs. In schools involved in the National Tutoring Corps, we recommend that tutoring by paid, well-trained tutors go to the lowest achievers in each grade. If schools also have other tutoring resources, they should be concentrated on students who are below grade level, but not struggling as much as the lowest achievers. These additional tutors might use the proven effective programs provided by the National Tutoring Corps, offering a consistent and effective approach to all students who need tutoring. The same might apply to other supportive services offered by the school.

Less Obvious But Critical Impacts

A Model for Evidence-to-Practice

            The success of evidence-based tutoring could contribute to the growth of evidence-based reform more broadly. If the National Tutoring Corps is seen to be effective because of its use of already-proven instructional approaches, this same idea could be used in every part of education in which robust evidence exists. For example, education leaders might reason that if use of evidence-based tutoring approaches had a big effect on students struggling in reading and math, perhaps similar outcomes could be achieved in algebra, or creative writing, or science, or programs for English learners.

Increasing the Amount and Quality of Development and Research on Replicable Solutions to Key Problems in Education

            If the widespread application of proven tutoring models broadly improves student outcomes, then it seems likely that government, private foundations, and perhaps creators of educational materials and software might invest far more in development and research than they do now, to discover new, more effective educational programs.

Reductions in Achievement Gaps

            If it were widely accepted that there were proven and practical means of significantly improving the achievement of low achievers, then there is no excuse for allowing achievement gaps to continue. Any student performing below the mean could be given proven tutoring and should gain in achievement, reducing gaps between low and high achievers.

Improvements in Behavior and Attendance

            Many of the students who engage in disruptive behavior are those who struggle academically, and therefore see little value in appropriate behavior. The same is true of students who skip school. Tutoring may help prevent behavior and attendance problems, not just by increasing the achievement of struggling students, but also by giving them caring, personalized teaching with a tutor who forms positive relationships with them and encourages attendance and good behavior.

Enhancing the Learning Environment for Students Who Do Not Need Tutoring

            It is likely that a highly successful tutoring initiative for struggling students could enhance the learning environment for the schoolmates of these students who do not need tutoring. This would happen if the tutored students were better behaved and more at peace with themselves, and if teachers did not have to struggle to accommodate a great deal of diversity in achievement levels within each class.

            Of course, all of these predictions depend on Congress funding a national tutoring plan based on the use of proven programs, and on implementation at scale actually producing the positive impacts that they have so often shown in research. But I hope these predictions will help policy makers and educational leaders realize the potential positive impacts a tutoring initiative could have, and then do what they can to make sure that the tutoring programs are effectively implemented and produce their desired impact. Then, and only then, will tutoring truly change everything.

Clarification:

Last week’s blog, on the affordability of tutoring, stated that a study of Saga Math, in which there was a per-pupil cost of $3,600, was intended as a demonstration, and was not intended to be broadly replicable.  However, all I meant to say is that Saga was never intended to be replicated AT THAT PRICE PER STUDENT.  In fact, a much lower-cost version of Saga Math is currently being replicated.  I apologize if I caused any confusion.

Photo credit: Deeper Learning 4 All, (CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Is a National Tutoring Corps Affordable?

Tutoring is certainly in the news these days. The December 30 Washington Post asked its journalists to predict what the top policy issues will be for the coming year. In education, Laura Meckler focused her entire prediction on just one issue: Tutoring. In an NPR interview (Kelly, 2020) with John King, U. S. Secretary of Education at the end of the Obama Administration and now President of Education Trust, the topic was how to overcome the losses students are certain to have sustained due to Covid-19 school closures. Dr. King emphasized tutoring, based on its strong evidence base. McKinsey (Dorn et al., 2020) did a report on early information on how much students have lost due to the school closures and what to do about it. “What to do” primarily boiled down to tutoring. Earlier articles in Education Week (e.g., Sawchuk, 2020) have also emphasized tutoring as the leading solution. Two bills introduced in the Senate by Senator Coons (D-Delaware) proposed a major expansion of AmeriCorps, mostly to provide tutoring and school health aides to schools suffering from Covid-19 school closures.

            All of this is heartening, but many of these same sources are warning that all this tutoring is going to be horrifically expensive and may not happen because we cannot afford it. However, most of these estimates are based on a single, highly atypical example. A Chicago study (Cook et al., 2015) of a Saga (or Match Education) math tutoring program for ninth graders estimated a per-pupil cost of one-to-two tutoring all year of $3,600 per student, with an estimate that at scale, the costs could be as low as $2,500 per student. Yet these estimates are unique to this single program in this single study. The McKinsey report applied the lower figure ($2,500 per student) to cost out tutoring for half of all 55 million students in grades k-12. They estimated an annual cost of $66 billion, just for math tutoring!

            Our estimate is that the cost of a robust national tutoring plan would be more like $7.0 billion in 2021-2022. How could these estimates be so different?  First, the Saga study was designed as a one-off demonstration that disadvantaged students in high school could still succeed in math. No one expected that Saga Math could be replicated at a per-pupil cost of $3,600 (or $2,500). In fact, a much less expensive form of Saga Math is currently being disseminated. In fact, there are dozens of cost-effective tutoring programs widely used and evaluated since the 1980s in elementary reading and math. One is our own Tutoring With the Lightning Squad (Madden & Slavin, 2017), which provides tutors in reading for groups of four students and costs about $700 per student per year. There are many proven small-group tutoring programs known to make a substantial difference in reading or math performance, (see Neitzel et al., in press; Nickow et al., 2020; Pellegrini et al., in press). These programs, most of which use teaching assistants as tutors, cost more like $1,500 per student, on average, based on the average cost of five tutoring programs used in Baltimore elementary schools (Tutoring With the Lightning Squad, Reading Partners, mClass Tutoring, Literacy Lab, and Springboard).

            Further, it is preposterous to expect to serve 27.5 million students (half of all students in k-12) all in one year. At 40 students per tutor, this would require hiring 687,500 tutors!

            Our proposal (Slavin et al., 2020) for a National Tutoring Corps proposes hiring 100,000 tutors by September, 2021, to provide proven one-to-one or (mostly) one-to-small group tutoring programs to about 4 million grade 1 to 9 students in Title I schools. This number of tutors would serve about 21% of Title I students in these grades in 2021-2022, at a cost of roughly $7.0 billion (including administrative costs, development, evaluation, and so on). This is less than what the government of England is spending right now on a national tutoring program, a total of £1 billion, which translates to $7.8 billion (accounting for the differences in population).

            Our plan would gradually increase the numbers of tutors over time, so in later years costs could grow, but they would never surpass $10 billion, much less $66 billion just for math, as estimated by McKinsey.

            In fact, even with all the money in the world, it would not be possible to hire, train, and deploy 687,500 tutors any time soon, at least not tutors using programs proven to work. The task before us is not to just throw tutors into schools to serve lots of kids. Instead, it should be to provide carefully selected tutors with extensive professional development and coaching to enable them to implement tutoring programs that have been proven to be effective in rigorous, usually randomized experiments. No purpose is served by deploying tutors in such large numbers so quickly that we’d have to make serious compromises with the amount and quality of training. Poorly-implemented tutoring would have minimal outcomes, at best.

            I think anyone would agree that insisting on high quality at substantial scale, and then growing from success to success as tutoring organizations build capacity, is a better use of taxpayers’ money than starting too large and too fast, with unproven approaches.

            The apparent enthusiasm for tutoring is wonderful. But misplaced dollars will not ensure the outcomes we so desperately need for so many students harmed by Covid-19 school closures. Let’s invest in a plan based on high-quality implementation of proven programs and then grow it as we learn more about what works and what scales in sustainable forms of tutoring.

Photo credit: Deeper Learning 4 All, (CC BY-NC 4.0)

References

Cook, P. J., et al. (2016) Not too late: Improving academic outcomes for disadvantaged youth. Available at https://www.ipr.northwestern.edu/documents/working-papers/2015/IPR-WP-15-01.pdf

Dorn, E., et al. (2020). Covid-19 and learning loss: Disparities grow and students need help. New York: McKinsey & Co.

Kelly, M. L. (2020, December 28). Schools face a massive challenge to make up for learning lost during the pandemic. National Public Radio.

Madden, N. A., & Slavin, R. E. (2017). Evaluations of technology-assisted small-group tutoring for struggling readers. Reading & Writing Quarterly: Overcoming Learning Difficulties, 33(4), 327–334. https://doi.org/10.1080/10573569.2016.1255577

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Nickow, A. J., Oreopoulos, P., & Quan, V. (2020). The transformative potential of tutoring for pre-k to 12 learning outcomes: Lessons from randomized evaluations. Boston: Abdul Latif Poverty Action Lab.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (in press). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open.

Sawchuk, S. (2020, August 26). Overcoming Covid-19 learning loss. Education Week 40(2), 6.

Slavin, R. E., Madden, N. A., Neitzel, A., & Lake, C. (2020). The National Tutoring Corps: Scaling up proven tutoring for struggling students. Baltimore: Johns Hopkins University, Center for Research and Reform in Education.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Large-Scale Tutoring Could Fail. Here’s How to Ensure It Does Not.

I’m delighted to see that the idea of large-scale tutoring to combat Covid-19 losses has gotten so important in the policy world that it is attracting scoffers and doubters. Michael Goldstein and Bowen Paulle (2020) published five brief commentaries recently in The Gadfly, warning about how tutoring could fail, both questioning the underlying research on tutoring outcomes (maybe just publication bias?) and noting the difficulties of rapid scale up. They also quote without citation a comment by Andy Rotherham, who quite correctly notes past disasters when government has tried and failed to scale up promising strategies: “Ed tech, class size reduction, teacher evaluations, some reading initiatives, and charter schools.” To these, I would add many others, but perhaps most importantly Supplementary Educational Services (SES), a massive attempt to implement all sorts of after school and summer school programs in high-poverty, low-achieving schools, which had near-zero impact, on average.

So if you were feeling complacent that the next hot thing, tutoring, was sure to work, no matter how it’s done, then you have not been paying attention for the past 30 years.

But rather than argue with these observations, I’d like to explain that the plan I’ve proposed, which you will find here, is fundamentally different from any of these past efforts, and if implemented as designed, with adequate funding, is highly likely to work at scale.

1.  Unlike all of the initiatives Rotherham dismisses, unlike SES, unlike just about everything ever used at scale in educational policy, the evidence base for certain specific, well-evaluated programs is solid.  And in our plan, only the proven programs would be scaled.

A little known but crucial fact: Not all tutoring programs work. The details matter. Our recent reviews of research on programs for struggling readers (Neitzel et al., in press) and math (Pellegrini et al., in press) identify individual tutoring programs that do and do not work, as well as types of tutoring that work well and those that do not.

Our scale-up plan would begin with programs that already have solid evidence of effectiveness, but it would also provide funding and third-party, rigorous evaluations of scaled-up programs without sufficient evidence, as well as new programs, designed to add additional options for schools. New and insufficiently evaluated programs would be piloted and implemented for evaluation, but they would not be scaled up unless they have solid evidence of effectiveness in randomized evaluations.

If possible, in fact, we would hope to re-evaluate even the most successful evaluated programs, to make sure they work.

If we stick to repeatedly-proven programs, rigorously evaluated in large randomized experiments, then who cares whether other programs have failed in the past? We will know that the programs being used at scale do work. Also, all this research would add greatly to knowledge about effective and ineffective program components and applications to particular groups of students, so over time, we’d expect the individual programs, and the field as a whole, to gain in the ability to provide proven tutoring approaches at scale.

2.  Scale-up of proven programs can work if we take it seriously. It is true that scale-up has many pitfalls, but I would argue that when scale-up does not occur it is for one of two reasons. First, the programs being scaled were not adequately proven in the first place. Second, the funding provided for scale-up was not sufficient to allow the program developers to scale up under the conditions they know full well are necessary. As examples of the latter, programs that provided well-trained and experienced trainers in their initial studies are often forced by insufficient funding to use trainer-of-trainers models for greatly diminished amounts of training in scale-up. As a result, the programs that worked at small scale failed in large-scale replication. This happens all the time, and this is what makes policy experts conclude that nothing works at scale.

However, the lesson they should have learned instead is just that programs proven to work at small scale can succeed if the key factors that made them work at small scale are implemented with fidelity at large scale. If anything less is done in scale-up, you’re taking big risks.

If well-trained trainers are essential, then it is critical to insist on well-trained trainers. If a certain amount or quality of training is essential, it is critical to insist on it, and make sure it happens in every school using a given program. And so on. There is no reason to skimp on the proven recipe.

But aren’t all these trainers and training days and other elements unsustainable?  This is the wrong question. The right one is, how can we make tutoring as effective as possible, to justify its cost?

Tutoring is expensive, but most of the cost is in the salaries of the tutors themselves. As an analogy, consider horse racing.  Horse owners pay millions for horses with great potential. Having done so, do you think they skimp on trainers or training? Of course not. In the same way, a hundred teaching assistants tutors cost roughly $4 million per year in salaries and benefits alone. Let’s say top-quality training for this group costs $500,000 per year, while crummy training costs $50,000. If these figures are in the ballpark, would it be wise to spend $4,500,000 on a terrific tutoring program, or $4,050,000 on a crummy one?

Successful scale-up takes place all the time in business. How does Starbucks make sure your experience in every single store is excellent? Simple. They have well-researched, well specified, obsessively monitored standards and quality metrics for every part of their operation. Scale-up in education can work just the same way, and in comparison to the costs of front-line personnel, the costs of great are trivially greater than the cost of crummy.

3.  Ongoing research will, in our proposal, formatively evaluate the entire tutoring effort over time, and development and evaluation will continually add new proven programs.  

Ordinarily, big federal education programs start with all kinds of rules and regulations and funding schemes, and these are announced with a lot of hoopla and local and national meetings to explain the new programs to local educators and leaders. Some sort of monitoring and compliance mechanism is put in place, but otherwise the program steams ahead. Several years later, some big research firm gets a huge contract to evaluate the program. On average, the result is almost always disappointing. Then there’s a political fight about just how disappointing the results are, and life goes on.

 The program we have proposed is completely different. First, as noted earlier, the individual programs that are operating at large scale will all be proven effective to begin with, and may be evaluated and proven effective again, using the same methods as those used to validate new programs. Second, new proven programs would be identified and scaled up all the time. Third, numerous studies combining observations, correlational studies, and mini-experiments would be evaluating program variations and impacts with different populations and circumstances, adding knowledge of what is happening at the chalkface and of how and why outcomes vary. This explanatory research would not be designed to decide which programs work and which do not (that would be done in the big randomized studies), but to learn from practice how to improve outcomes for each type of school and application. The idea is to get smarter over time about how to make tutoring as effective as it can be, so when the huge summative evaluation takes place, there will be no surprises. We would already know what is working, and how, and why.

Our National Tutoring Corps proposal is not a big research project, or a jobs program for researchers. The overwhelming focus is on providing struggling students the best tutoring we know how to provide. But using a small proportion of the total allocation would enable us to find out what works, rapidly enough to inform practice. If this were all to happen, we would know more and be able to do more every year, serving more and more struggling students with better and better programs.

So rather than spending a lot of taxpayer money and hoping for the best, we’d make scale-up successful by using evidence at the beginning, middle, and end of the process, to make sure that this time, we really know what we are doing. We would make sure that effective programs remain successful at scale, rather than merely hoping they will.

References

Goldstein, M., & Paulle, B. (2020, Dec. 8) Vaccine-making’s lessons for high-dosage tutoring, Part 1. The Gadfly.

Goldstein, M., & Paulle, B. (2020, Dec. 11). Vaccine-making’s lessons for high-dosage tutoring, Part IV. The Gadfly.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (in press). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open.

Original photo by Catherine Carusso, Presidio of Monterey Public Affairs

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

The Details Matter. That’s Why Proven Tutoring Programs Work Better than General Guidelines.

When I was in first grade, my beloved teacher, Mrs. Adelson, introduced a new activity. She called it “phonics.”  In “phonics,” we were given tiny pieces of paper with letters on them to paste onto a piece of paper, to make words. It was a nightmare. Being a boy, I could sooner sprout wings and fly than do this activity without smearing paste and ink all over the place. The little slips of paper stuck to my thumb rather than to the paper. This activity taught me no phonics or reading whatsoever, but did engender a longtime hatred of “phonics,” as I understood it.

Much, much later I learned that phonics was essential in beginning reading, so I got over my phonics phobia. And I learned an important lesson. Even if an activity focuses on an essential skill, this does not mean that just any activity with that focus will work. The details matter.

I’ve had reason to reflect on this early lesson many times recently, as I’ve spoken to various audiences about our National Tutoring Corps plan. Often, people will ask why it is important to use specific proven programs. Why not figure out the characteristics of proven programs, and encourage tutors to use those consensus strategies?

The answer is that because the details matter, tutoring according to agreed-upon practices is not going to be as effective as specific proven programs, on average. Mrs. Adelson had a correct understanding of the importance of phonics in beginning reading, but in the classroom, where the paste hits the page, her phonics strategy was awful. In tutoring, we might come to agreement about factors such as group size, qualifications of tutors, amount of PD, and so on, but dozens of details also have to be right. An effective tutoring program has to get right crucial features, such as the nature and quality of tutor training and coaching, student materials and software, instructional strategies, feedback and correction strategies when students make errors, frequency and nature of assessments, means of motivating and recognizing student progress, means of handling student absences, links between tutors and teachers and between tutors and parents, and much more. Getting any of these strategies wrong could greatly diminish the effectiveness of tutoring.

The fact that a proven program has shown positive outcomes in rigorous experiments supports confidence that the program’s particular constellation of strategies is effective. During any program’s development and piloting, developers have had to experiment with solutions to each of the key elements. They have had many opportunities to observe tutoring sessions, to speak with tutors, to look at formative data, and to decide on specific strategies for each of the problems that must be solved. A teacher or local professional developer has not had the opportunity to try out and evaluate specific components, so even if they have an excellent understanding of the main elements of tutoring, they could use or promote key components that are not effective or may even be counterproductive. There are now many practical, ready-to-implement, rigorously evaluated tutoring programs with positive impacts (Neitzel et al., in press). Why should we be using programs whose effects are unknown, when there are many proven alternatives?

Specificity is of particular importance in small-group tutoring, because very effective small group methods superficially resemble much less effective methods (see Borman et al., 2001; Neitzel et al., in press; Pellegrini et al., 2020). For example, one-to-four tutoring might look like traditional Title I pullouts, which are far less effective. Some “tutors” teach a class of four no differently than they would teach a class of thirty. Tutoring methods that incorporate computers may also superficially resemble computer assisted instruction, which is also far less effective. Tutoring derives its unique effectiveness from the ability of the tutor to personalize instruction for each child, to provide unique feedback to the specific problems each student faces. It also depends on close relationships between tutors and students. If the specifics are not carefully trained and implemented with understanding and spirit, small-group tutoring can descend into business-as-usual. Not that ordinary teaching and CAI are ineffective, but to successfully combat the effects of Covid-19 school closures and learning gaps in general, tutoring must be much more effective than similar-looking methods. And it can be, but only if tutors are trained and equipped to provide tutoring that has been proven to be effective.

Individual tutors can and do adapt tutoring strategies to meet the needs of particular students or subgroups, and this is fine if the tutor is starting from a well-specified and proven, comprehensive tutoring program and making modifications for well-justified reasons. But when tutors are expected to substantially invent or interpret general strategies, they may make changes that diminish program effectiveness. All too often, local educators seek to modify proven programs to make them easier to implement, less expensive, or more appealing to various stakeholders, but these modifications may leave out elements essential to program effectiveness.

The national experience of Supplementary Educational Services illustrates how good ideas without an evidence base can go wrong. SES provided mostly after-school programs of all sorts, including various forms of tutoring. But hardly any of these programs had evidence of effectiveness. A review of outcomes of almost 400 local SES grants found reading and math effect sizes near zero, on average (Chappell et al., 2011).

In tutoring, it is essential that every student receiving tutoring gets a program highly likely to measurably improve the student’s reading or mathematics skills. Tutoring is expensive, and tutoring is mostly used with students who are very much at risk. It is critical that we give every tutor and every student the highest possible probability of life-altering improvement. Proven, replicable, well-specified programs are the best way to ensure positive outcomes.

Mrs. Adelson was right about phonics, but wrong about how to teach it. Let’s not make the same mistake with tutoring.

References

Borman, G., Stringfield, S., & Slavin, R.E. (Eds.) (2001).  Title I: Compensatory education at the crossroads.  Mahwah, NJ: Erlbaum.

Chappell, S., Nunnery, J., Pribesh, S., & Hager, J. (2011). A meta-analysis of Supplemental Educational Services (SES) provider effects on student achievement. Journal of Education for Students Placed at Risk, 16(1), 1-23.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (2020). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

Photo by Austrian National Library on Unsplash

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Tutors to Teachers: Could a National Tutoring Corps Help Hard-to-Staff Schools?

We are continuing to work with colleagues to propose to the incoming Biden administration a plan to fund tutors to elementary and secondary schools to work with students who are far behind in reading or math. Today, I wanted to expand on one aspect of our proposal.

As it currently stands, we are proposing that the federal government provide Title I schools with funds to hire tutors, who will be required to have a college degree and experience with children. If this proposal becomes reality, it would include a plan to help schools identify particularly effective tutors and offer them a rapid path to teacher certification.

One assumption behind this part of our proposal is that most tutors will be recent college graduates who do not have teaching certificates, most probably majoring in something other than education. Many of the tutors are sure to discover the joys of teaching. At the same time, school leaders are sure to notice that many of their tutors are doing an exceptional job. Our proposal is simply to facilitate a process in which excellent, successful tutors can become teachers.

There are several important advantages to schools and to society of this new source of teacher candidates. First, tutors would be concentrated in high-poverty inner-city or distant rural Title I schools. Such schools typically have difficulty recruiting top candidates. Often, the top candidates they do get are from the local area, often graduates of the very schools in which they hope to teach. We have noticed that tutor applicants (with college degrees) usually come from the local area.

Second, schools often struggle to find as many minority candidates as they would like. Tutors, in our experience, better represent the demographics of their schools than do teachers. Among the many college-graduate applications we typically get to work in Baltimore, about 80% are Black, and about 80% of hires have also been Black. This matches the percent Black of Baltimore City Public Schools students, but not of its teachers, 40% of whom are Black. If our Baltimore experience is typical, hiring tutors and then encouraging and supporting them to go for a teaching certificate may be one way to bring talented Black teachers into teaching.

We have seen a similar dynamic in majority-Hispanic districts and in rural districts. Local tutors with a strong tie to a local area, who have demonstrated their skills as tutors, may be an ideal group from which to recruit applicants whose commitment to teaching in that place is strong, and likely to be lifelong.

Many years ago, we were working on a study of our Success for All program in Baltimore, and in one inner-city school we noticed an extraordinary Black teacher. We got to know her, and discovered that she grew up near the school she taught in, and attended that very school. As a teacher, she could have chosen to live almost anywhere, but she chose to live in the house she grew up in, in an inner-city neighborhood. This was where she wanted to teach, where she wanted to make her contribution to her community. We’ve encountered many amazing teachers in rural places who are also teaching in the schools they attended. I cannot say exactly how this part of our tutoring plan will be accomplished, or what its effects might be on the teaching staffs of high-poverty schools. But bringing local college graduates into local schools as tutors and then helping the best of them to become teachers would be an important additional outcome of our National Tutoring Corps plan.

Photo credit: Shenandoah University Office of Marketing and Communications, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

In Meta-Analyses, Weak Inclusion Standards Lead to Misleading Conclusions. Here’s Proof.

By Robert Slavin and Amanda Neitzel, Johns Hopkins University

In two recent blogs (here and here), I’ve written about Baltimore’s culinary glories: crabs and oysters. My point was just that in both cases, there is a lot you have to discard to get to what matters. But I was of course just setting the stage for a problem that is deadly serious, at least to anyone concerned with evidence-based reform in education.

Meta-analysis has contributed a great deal to educational research and reform, helping readers find out about the broad state of the evidence on practical approaches to instruction and school and classroom organization. Recent methodological developments in meta-analysis and meta-regression, and promotion of the use of these methods by agencies such as IES and NSF, have expanded awareness and use of modern methods.

Yet looking at large numbers of meta-analyses published over the past five years, even up to the present, the quality is highly uneven. That’s putting it nicely.  The problem is that most meta-analyses in education are far too unselective with regards to the methodological quality of the studies they include. Actually, I’ve been ranting about this for many years, and along with colleagues, have published several articles on it (e.g., Cheung & Slavin, 2016; Slavin & Madden, 2011; Wolf et al., 2020). But clearly, my colleagues and I are not making enough of a difference.

My colleague, Amanda Neitzel, and I thought of a simple way we could communicate the enormous difference it makes if a meta-analysis accepts studies that contain design elements known to inflate effect sizes. In this blog, we once again use the Kulik & Fletcher (2016) meta-analysis of research on computerized intelligent tutoring, which I critiqued in my blog a few weeks ago (here). As you may recall, the only methodological inclusion standards used by Kulik & Fletcher required that studies use RCTs or QEDs, and that they have a duration of at least 30 minutes (!!!). However, they included enough information to allow us to determine the effect sizes that would have resulted if they had a) weighted for sample size in computing means, which they did not, and b) excluded studies with various features known to inflate effect size estimates. Here is a table summarizing our findings when we additionally excluded studies containing procedures known to inflate mean effect sizes:

If you follow meta-analyses, this table should be shocking. It starts out with 50 studies and a very large effect size, ES=+0.65. Just weighting the mean for study sample sizes reduces this to +0.56. Eliminating small studies (n<60) cut the number of studies almost in half (n=27) and cut the effect size to +0.39. But the largest reductions are due to excluding “local” measures, which on inspection are always measures made by developers or researchers themselves. (The alternative was “standardized measures.”) By itself, excluding local measures (and weighting) cut the number of included studies to 12, and the effect size to +0.10, which was not significantly different from zero (p=.17). Excluding small, brief, and “local” measures only slightly changes the results, because both small and brief studies almost always use “local” (i.e., researcher-made) measures. Excluding all three, and weighting for sample size, leaves this review with only nine studies and an effect size of +0.09, which is not significantly different from zero (p=.21).

The estimates at the bottom of the chart represent what we call “selective standards.” These are the standards we apply in every meta-analysis we write (see www.bestevidence.org), and in Evidence for ESSA (www.evidenceforessa.org).

It is easy to see why this matters. Selective standards almost always produce much lower estimates of effect sizes than do reviews with much less selective standards, which therefore include studies containing design features that have a strong positive bias on effect sizes. Consider how this affects mean effect sizes in meta-analyses. For example, imagine a study that uses two measures of achievement. One is a measure made by the researcher or developer specifically to be “sensitive” to the program’s outcomes. The other is a test independent of the program, such as GRADE/GMADE or Woodcock, standardized tests but not necessarily state tests. Imagine that the researcher-made measure obtains an effect size of +0.30, while the independent measure has an effect size of +0.10. A less-selective meta-analysis would report a mean effect size of +0.20, a respectable-sounding impact. But a selective meta-analysis would report an effect size of +0.10, a very small impact. Which of these estimates represents an outcome with meaning for practice? Clearly, school leaders should not value the +0.30 or +0.20 estimates, which require use of a test designed to be “sensitive” to the treatment. They should care about the gains on the independent test, which represents what educators are trying to achieve and what they are held accountable for. The information from the researcher-made test may be valuable to the researchers, but it has little or no value to educators or students.

The point of this exercise is to illustrate that in meta-analyses, choices of methodological exclusions may entirely determine the outcomes. Had they chosen other exclusions, the Kulik & Fletcher meta-analysis could have reported any effect size from +0.09 (n.s.) to +0.65 (p<.000).

The importance of these exclusions is not merely academic. Think how you’d explain the chart above to your sister the principal:

            Principal Sis: I’m thinking of using one of those intelligent tutoring programs to improve achievement in our math classes. What do you suggest?

            You:  Well, it all depends. I saw a review of this in the top journal in education research. It says that if you include very small studies, very brief studies, and studies in which the researchers made the measures, you could have an effect size of +0.65! That’s like seven additional months of learning!

            Principal Sis:  I like those numbers! But why would I care about small or brief studies, or measures made by researchers? I have 500 kids, we teach all year, and our kids have to pass tests that we don’t get to make up!

            You (sheepishly):  I guess you’re right, Sis. Well, if you just look at the studies with large numbers of students, which continued for more than 12 weeks, and which used independent measures, the effect size was only +0.09, and that wasn’t even statistically significant.

            Principal Sis:  Oh. In that case, what kinds of programs should we use?

From a practical standpoint, study features such as small samples or researcher-made measures add a lot to effect sizes while adding nothing to the value to students or schools of the programs or practices they want to know about. They just add a lot of bias. It’s like trying to convince someone that corn on the cob is a lot more valuable than corn off the cob, because you get so much more quantity (by weight or volume) for the same money with corn on the cob.     Most published meta-analyses only require that studies have control groups, and some do not even require that much. Few exclude researcher- or developer-made measures, or very small or brief studies. The result is that effect sizes in published meta-analyses are very often implausibly large.

Meta-analyses that include studies lacking control groups or studies with small samples, brief durations, pretest differences, or researcher-made measures report overall effect sizes that cannot be fairly compared to other meta-analyses that excluded such studies. If outcomes do not depend on the power of the particular program but rather on the number of potentially biasing features they did or did not exclude, then outcomes of meta-analyses are meaningless.

It is important to note that these two examples are not at all atypical. As we have begun to look systematically at published meta-analyses, most of them fail to exclude or control for key methodological factors known to contribute a great deal of bias. Something very serious has to be done to change this. Also, I’d remind readers that there are lots of programs that do meet strict standards and show positive effects based on reality, not on including biasing factors. At www.evidenceforessa.org, you can see more than 120 reading and math programs that meet selective standards for positive impacts. The problem is that in meta-analyses that include studies containing biasing factors, these truly effective programs are swamped by a blizzard of bias.

In my recent blog (here) I proposed a common set of methodological inclusion criteria that I would think most methodologists would agree to.  If these (or a similar consensus list) were consistently used, we could make more valid comparisons both within and between meta-analyses. But as long as inclusion criteria remain highly variable from meta-analysis to meta-analysis, then all we can do is pick out the few that do use selective standards, and ignore the rest. What a terrible waste.

References

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45 (5), 283-292.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.

Slavin, R. E., Madden, N. A. (2011). Measures inherent to treatments in program effectiveness reviews. Journal of Research on Educational Effectiveness, 4, 370–380.

Wolf, R., Morrison, J.M., Inns, A., Slavin, R. E., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Journal of Research on Educational Effectiveness. DOI: 10.1080/19345747.2020.1726537

Photo credit: Deeper Learning 4 All, (CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org