Lessons for Educational Research from the COVID-19 Vaccines

Since the beginning of the COVID-19 pandemic, more than 130 biotech companies have launched major efforts to develop and test vaccines. Only four have been approved so far (Pfizer, Moderna, Johnson & Johnson, and AstraZeneca). Among the others, many have outright failed, and others are considered highly unlikely. Some of the failed vaccines are small, fringe companies, but they also include some of the largest and most successful drug companies in the world: Merck (U.S.), Glaxo-Smith-Kline (U.K.), and Sanofi (France).

Kamala Harris gets her vaccine.

Photo courtesy of NIH

If no further companies succeed, the score is something like 4 successes and 126 failures.  Based on this, is the COVID vaccine a triumph of science, or a failure? Obviously, if you believe that even one of the successful programs is truly effective, you would have to agree that this is one of the most extraordinary successes in the history of medicine. In less than one year, companies were able to create, evaluate, and roll out successful vaccines, already saving hundreds of thousands of lives worldwide.

Meanwhile, Back in Education . . .

The example of COVID vaccines contrasts sharply with the way research findings are treated in education. As one example, Borman et al. (2003) reviewed research on 33 comprehensive school reform programs. Only three of these had solid evidence of effectiveness, according to the authors (one of these was our program, Success for All; see Cheung et al., in press). Actually, few of the programs failed; most had just not been evaluated adequately. Yet the response from government and educational leaders was “comprehensive school reform doesn’t work” rather than, “How wonderful! Let’s use the programs proven to work.” As a result, a federal program supporting comprehensive school reform was canceled, use of comprehensive school reform plummeted, and most CSR programs went out of operation (we survived, just barely, but the other two successful programs soon disappeared).

Similarly, the What Works Clearinghouse, and our Evidence for ESSA website (www.evidenceforessa.org), are often criticized because so few of the programs we review turn out to have significant positive outcomes in rigorous studies.

The reality is that in any field in which rigorous experiments are used to evaluate innovations, most of the innovations fail. Mature science-focused fields, like medicine and agriculture, expect this and honor it, because the only way to prevent failures is to do no experiments at all, or only flawed experiments. Without rigorous experiments, we would have no reliable successes.  Also, we learn from failures, as scientists are learning from the findings of the evaluations of all 130 of the COVID vaccines.

Unfortunately, education is not a mature science-focused field, and in our field, failure to show positive effects in rigorous experiments leads to cover-ups, despair, abandonment of proven and promising approaches, or abandonment of rigorous research itself. About 20 years ago, a popular federally-funded education program was found to be ineffective in a large, randomized experiment. Supporters of this program actually got Congress to enact legislation that forbade the use of randomized experiments to evaluate this program!

Research has improved in the past two decades, and acceptance of research has improved as well. Yet we are a long way from medicine, for example, which accepts both success and failure as part of a process of using science to improve health. In our field, we need to commit to broad scale, rigorous evaluations of promising approaches, wide dissemination of programs that work, and learning from experiments that do not (yet) show positive outcomes. In this way, we could achieve the astonishing gains that take place in medicine, and learn how to produce these gains even faster using all the knowledge acquired in experiments, successful or not.

References

Borman, G. D., Hews, G. M., Overman, L. T., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 12(2), 125-230.

Cheung, A., Xie, C., Zhang, T. & Slavin, R. E. (in press). Success for All: A quantitative synthesis of evaluations. Journal of Research on Educational Effectiveness.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just enter your email address here.

Getting Proven Tutoring Programs Into Widespread Practice

Over the past 20 years, there has been a major increase in the number of educational programs that have been developed, evaluated in rigorous (usually randomized) experiments, found to make a substantial difference in achievement, and then offered to schools by non-profit or for-profit organizations. Educators can easily find out about these proven programs in the federal What Works Clearinghouse, our own Evidence for ESSA website, and other sources. Yet very few of these, no matter how effective, have been widely adopted by schools. In 2015, the Every Student Succeeds Act (ESSA) defined what it means for a program to have strong, moderate, or promising evidence of effectiveness, and encouraged or even (in some cases) incentivized use of these programs. Yet even with this, few of the roughly 120,000 U.S. elementary and secondary schools regularly use any of the more than 120 proven reading or mathematics programs that meet the requirements of Evidence for ESSA and show significant positive effects.

The evidence-to-practice connection in education is much in contrast to that in, say, medicine, where medications and treatments of all kinds that have proven in rigorous experiments that they can cure or prevent diseases are usually widely adopted by medical practitioners. The recent, rapid development and successful evaluations of Covid-19 vaccines are much in the news right now, but dozens of new drugs and other treatments are validated every year, and are then adopted widely. It is certainly true that vastly more money is invested in the whole evidence-to-practice process in medicine than is true in education, but even when educational programs are found effective in randomized experiments like those required in medicine, these programs rarely enter large-scale use. Further, evidence-to-practice is common in many other fields, such as agriculture and technology. But not education.

How Tutoring Might Change Evidence-to-Practice in Education

One of the problems of evidence-to-practice in education is that we lack clear examples where programs were proven effective and then universally applied and found effective at scale. For example, evidence-to-practice was haphazard in medicine until the 20th century, when sulfa drugs, penicillin, morphine, and cures for polio, among many others, solved massive societal problems, and established the idea that research in medicine could truly bring about major change. These breakthroughs were explicitly engineered to solve health problems of great concern to the public, just as the Covid-19 vaccines were explicitly engineered to solve the pandemic.

In education, we face similar problems in the post-Covid period. Millions of students have fallen far behind due to the closure of their schools. All sorts of solutions have been proposed, but only one, tutoring, has both a solid and substantial research base and a significant number of proven, practical, cost-effective solutions.

Perhaps this is our penicillin/polio/Covid moment. We face a problem that no one can deny, of a desperate need to enable millions of students who have lost ground in the pandemic to rapidly gain in reading and math achievement. The American Rescue Plan Act (ARPA) has provided billions of dollars to solve the problem, so for once, lack of money is not an obstacle. Due to developments in educational research over the past 20 years, we now have a set of tutoring models that have been proven effective in randomized experiments. If these programs can be rapidly scaled up and applied to enough students to make a meaningful difference among struggling students nationwide, then this may serve as the example we need to establish that development, research, and dissemination can solve societal problems in education.

ProvenTutoring

My colleagues and I are organizing a demonstration exactly along these lines. Fourteen proven tutoring programs for reading and mathematics have formed a coalition, which we call ProvenTutoring. The purpose of the coalition is to make the case for proven tutoring programs, and then provide schools, districts, and states a choice of proven programs. Whichever programs are selected will then provide tutors with first-class professional development to ensure that tutoring is done well and that students receive the maximum benefit possible.

The fourteen programs had to be proven to succeed with college-educated teaching assistants as tutors (because requiring certified teachers would be impossible in a time of teacher shortages, and because evidence finds that well-supported teaching assistants get results as good as those obtained by certified teachers). Each program had to have capacity to go to substantial scale. We estimate that we can collectively serve 100,000 tutors who can serve about four million children, if all goes well.

ProvenTutoring will soon launch a website (which we will call ProvenTutoring.org) and a nationwide communications campaign to focus schools on proven tutoring as a key part of their post Covid-19 plans to combat learning losses. We will offer school districts webinars on proven tutoring, and allow users to explore specific programs to make informed choices among them. We will maintain rigorous standards of training and implementation, to make sure that the quality of implementation in practice, at scale, will be no less than the quality of implementation in the controlled experiments that established their impact.

How Successful Tutoring Could Lead to Support for Evidence-to-Practice

When districts, states, and the nation evaluate reading and math outcomes of struggling students in schools that adopted proven tutoring programs, in comparison to schools that used their ARPA money on other approaches, the outcomes should be clear. If they are, this will be a wonderful development for struggling students, and a major boost for tutoring as an intervention. However, it may also provide the example educational research needs to establish its capacity to solve big practical problems.

If ProvenTutoring is as effective as we expect, perhaps it will occur to our profession that this same strategy could apply wherever we want. This could lead to accelerated investment by government in development and evaluation of replicable programs in every crucial area of education, where robust solutions are needed. Someday, could there be ProvenAlgebra.org? ProvenScience.org? ProvenGraduation.org? Proven programs for English learners? ProvenPreschool.org? ProvenClassroomManagement.org? ProvenCivics.org? Whatever problems most need to be solved, there is no reason we cannot solve them using the same evidence-to-practice strategies that medicine, agriculture, and other fields have used so successfully for many decades.

Photo courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

The American Rescue Plan Can Rescue Education, If We Use It to Fund What Works

The American Rescue Plan was passed in the U.S. Congress this week. This $1.9 trillion bill provides funding for a lot of things I care about as a citizen, but as an educator, I’d like to focus on the portion of it allocated to healing Covid learning loss. This is $29 billion, or roughly double the usual amount spent annually on Title I. This is a major investment in the students whose educations were harmed the most by Covid school closures. These are mostly disadvantaged students and rural students who could not gain access to remote teaching, or who did not have assistance at home to take advantage of remote instruction. Data from all over the country is showing the educational damage these children have sustained.

Clearly, the new money in the ARP could make a substantial difference in the achievement and adjustment of all students returning to in-person schooling. But if educational research tells us anything at all, it tells us these two things:

  1. Making a big difference in educational outcomes costs money.
  2. However, lots of well-meaning uses of money do not make any perceptible difference in outcomes.

Of course, the only way to tell effective uses of new funds from ineffective uses is through rigorous research.

One of the unusual aspects of the ARP education funding is that the legislation is not very specific about how the money is to be used. This is due in part to the fact the ARP was passed using a reconciliation procedure that does not allow for much specificity. The U.S. Department of Education will be drafting guidelines for the money soon, but these guidelines are likely to be relatively flexible, because the legislation itself was not very specific.

This flexibility is likely to allow anything from very good uses of money to very poor uses. My guess is that state and district leaders, and individual principals, will have plenty of freedom to use plenty of money. How novel!

I hope states and districts will use this opportunity to clearly define what is most important to accomplish in their post-Covid planning and then insist on choosing programs, practices, and policies based on the best evidence available. This time, educators will have the opportunity to use research-proven programs not because Congress or the U.S. Department of Education tells them to, but because they care about the learning and emotional well-being of their students.

In the period following the passage of the 2015 Every Student Succeeds Act (ESSA), state, district, and building leaders learned how to use services such as the What Works Clearinghouse and our www.evidenceforessa.org website to find out the strength of the evidence supporting various programs. I hope schools will continue to use these resources to select programs that have been proven effective. I’ve written many times about the importance of using proven tutoring programs, and this is indeed the most effective strategy by far for students who are far behind in reading or math. But there are many other approaches proven to be effective, especially for disadvantaged students. There is good evidence of effectiveness not only for classroom approaches to reading and math, but also programs for creative writing, science, social-emotional learning, early childhood education, and much more. The ARP funding allows schools to invest in proven programs and find out for themselves whether they work. ARP money will not be around forever, but wouldn’t it be a great use of the money to find out what works, so that when things return to normal, school and district leaders will know more than ever before what works and what doesn’t for their particular students and their particular schools?

In the first months after all schools open for in-person learning, schools are sure to be thinking in emergency mode, about investments in tutoring and other relatively expensive but highly effective strategies. But the damage Covid has done will have long-lasting impacts, and even if schools use proven tutoring methods to help the students at the greatest risk, it is also important to build for the long haul for all students, using proven programs of all kinds. Wouldn’t it be wonderful if the terrible experience we have all been through leads to a more rational, evidence-driven approach to schooling, creating a lasting benefit not only for today’s children, but for future generations who will receive better educations than they would have before Covid?

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

The Role of Research and Development in Post-Covid Education

Everyone knows that during World War II, the pace of innovation greatly accelerated. Computers, rocketry, jets, sonar, radar, microwaves, aerosol cans, penicillin, and morphine were among the many wartime developments. What unites these innovations, of course, is that each was developed to solve an urgent problem important to the war effort, and all of them later tuned out to have revolutionary benefits for civilian use. Yet these advances could not have taken place so quickly if not for the urgent need for innovations and the massive resources devoted to them.

Crisis can be the catalyst for innovation.

Today, we face Covid, a dire medical crisis, and investments of massive resources have produced vaccines in record time. However, the Covid pandemic has also created an emergency in education, as millions of children are experiencing educational losses due to school closures. The Institute for Education Sciences (IES) has announced a grants program to respond to the Covid crisis, but at the usual pace, the grants will only lead to practical solutions in many years when (we fervently hope) the crisis will be over.

I would argue that in this perilous time, research in education should focus on urgent practical problems that could have a significant impact within, say, the next year or two on the problems of students who are far below grade level in essential skills because of Covid school closures, or for other reasons:

1. Tutoring. Yes, of course I was going to start with tutoring. The biggest problem in tutoring is that while we have many proven programs for elementary reading, especially for grades K-3, we have far fewer proven programs ready for prime time in the upper elementary grades, and none at all in middle or high school reading. Studies in England have found positive effects of tutoring in their equivalent of middle school, but none of these exist in the U.S. In mathematics, there are few proven tutoring programs in elementary school, and just one I know of for middle school, and one for high school.

How could research funding produce new tutoring programs for middle and high school reading, and for math at all grade levels, in such a short time?  Simple. First, there are already tutoring programs for reading and math at all grade levels, but few have been successfully evaluated, or (in most cases) ever evaluated at all in rigorous experiments. So it would be important to fund evaluations of particularly promising programs that are already working at significant scale.

Another means of rapidly discovering effective tutoring programs would be to fund programs that have been successful in certain grade levels to quickly create programs for adjacent grades. For example, a program proven effective in grades 2-3 should be able to be significantly modified to work in grades 4-5. One that works in grades 4-5 could be modified for use in middle school. Programs proven effective in reading might be modified for use in mathematics at the same grade level, or vice versa. Many programs with successful programs in some grade levels have the staff and experience to quickly create programs in adjacent grade levels.

Also, it might be possible for developers of successful classwide technology programs to create and pilot tutoring models using similar software, but adding the assistance of a tutor for groups of one to four students, perhaps in collaboration with experts on tutoring.

2. Approaches other than tutoring.  There are many effective reading and math programs of all kinds, not just tutoring, that have proven their effectiveness (see www.evidenceforessa.org). Such programs might be ready to go as they are, and others could be evaluated in a form appropriate to the current emergency. Very few programs other than tutoring obtain effect sizes like those typical of the best tutoring programs, but classwide programs with modest effect sizes serve many more students than tutoring programs do. Also, classroom programs might be evaluated for their capacity to maintain gains made due to tutoring.

Tutoring or non-tutoring programs that already exist at scale, or that could be quickly adapted from proven programs, might be ready for rigorous, third-party evaluations as soon as fall, 2021. These programs should be evaluated using rigorous, third-party evaluations, with all programs at a given grade level using identical procedures and measures. In this way, it should be possible to have many new, proven programs by the end of the 2021-2022 school year, ready for dissemination in fall, 2022. This would be in time to greatly add capacity to serve the millions of students who need proven programs to help them make rapid progress toward grade level.

A research program of this kind could be expensive, and it may not provide theoretical breakthroughs. However, given the substantial and obvious need, and the apparent willingness of government to provide major resources to combat Covid learning losses, such a research effort might be feasible. If it were to take place, it might build excitement about R & D as a practical means of enhancing student achievement. And if even a quarter of the experiments found sizable positive impacts, this would add substantially to our armamentarium of proven strategies for struggling students.

There is an old saying in social work: “Never let a good crisis go to waste.” As in World War II, the educational impacts of the Covid pandemic present educational research with a crisis that we must solve, but if we can solve any portion of this problem, this will create benefits for generations of children long after Covid has faded into a distant memory.

Photo credit: User Messybeast on en.wikipedia, CC BY-SA 3.0 <http://creativecommons.org/licenses/by-sa/3.0/>, via Wikimedia Commons

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Building Back Better

Yesterday, President Joe Biden took his oath of office. He is taking office at one of the lowest points in all of American history. Every American, whatever their political beliefs, should be wishing him well, because his success is essential for the recovery of our nation.

In education, most schools remain closed or partially open, and students are struggling with remote learning. My oldest granddaughter is in kindergarten. Every school day, she receives instruction from a teacher she has never met. She has never seen the inside of “her school.” She is lucky, of course, because she has educators as grandparents (us), but it is easy to imagine the millions of kindergartners who do not even have access to computers, or do not have help in learning to read and learning mathematics. These children will enter first grade with very little of the background they need, in language and school skills as well as in content.

Of course, the problem is not just kindergarten. All students have missed a lot of school, and they will vary widely in their experiences during that time. Think of second graders who essentially missed first grade. Students who missed the year when they are taught biology. Students who missed the fundamentals of creative writing. Students who should be in Algebra 2, except that they missed Algebra 1.

Hopefully, providing vaccines as quickly as possible to school staffs will enable most schools to open this spring. But we have a long, long way to go to get back to normal, especially with disadvantaged students. We cannot just ask students on their first day back to open their math books to the page they were on in March, 2020, when school closed.

Students need to be assessed when they return, and if they are far behind in reading or math, given daily tutoring, one-to-one or one-to-small group. If you follow this blog, you’ve heard me carry on at length about this.

Tutoring services, using tutoring programs proven to be effective, will be of enormous help to students who are far behind grade level (here, here, here). But the recovery from Covid-19 school closures should not be limited to repairing the losses. Instead, I hope the Covid-19 crisis can be an opportunity to reconsider how to rebuild our school system to enhance the school success of all students.

If we are honest with ourselves, we know that schooling in America was ailing long before Covid-19. It wasn’t doing so badly for middle class children, but it was failing disadvantaged students. These very same students have suffered disproportionately from Covid-19. So in the process of bringing these children back into school, let’s not stop with getting back to normal. Let’s figure out how to create schools that use the knowledge we have gained over the past 20 years, and knowledge we can develop in the coming years, to transform learning for our most vulnerable children.

Building Back Better

Obviously, the first thing we have to do this spring is reopen schools and make them as healthy, happy, welcoming, and upbeat as possible. We need to make sure that schools are fully staffed and fully equipped. We do need to “build back” before we can “build back better.” But we cannot stop there. Below, I discuss several things that would greatly transform education for disadvantaged students.

1.  Tutoring

Yes, tutoring is the first thing we have to do to build better. Every child who is significantly below grade level needs daily one-to-small group or one-to-one tutoring, until they reach a pre-established level of performance, depending on grade level, in reading and math.

However, I am not talking about just any tutoring. Not all tutoring works. But there are many programs that have been proven to work, many times. These are the tutoring programs we need to start with as soon as possible, with adequate training resources to ensure student success.

Implementing proven tutoring programs on a massive scale is an excellent “build back” strategy, the most effective and cost-effective strategy we have. However, tutoring should also be the basis for a key “build better” strategy

2.  Establishing success as a birthright and ensuring it using proven programs of all kinds.

We need to establish adequate reading and mathematics achievement as the birthright of every child. We can debate about what that level might be, but we must hold ourselves accountable for the success of every child. And we need to accomplish this not just by using accountability assessments and hoping for the best, but by providing proven programs to all students who need them for as long as they need them.

As I’ve pointed out in many blogs (here, here, here), we now have many programs proven effective in rigorous experiments and known to improve student achievement (see www.evidenceforessa.org). Every child who is performing below level, and every school serving many children below grade level, should have resources and knowledge to adopt proven programs. Teachers and tutors need to be guaranteed sufficient professional development and in-class coaching to enable them to successfully implement proven programs. Years ago, we did not have sufficient proven programs, so policy makers kept coming up with evidence-free policies, which have just not worked as intended. But now, we have many programs ready for widespread dissemination. To build better, we have to use these tools, not return to near universal use of instructional strategies, materials, and technology that have never been successfully evaluated. Instead, we need to use what works, and to facilitate adoption and effective implementation of proven programs.

3.  Invest in development and evaluation of promising programs.

How is it that in a remarkably short time, scientists were able to develop vaccines for Covid-19, vaccines that promise to save millions of lives? Simple. We invested billions in research, development, and evaluations of alternative vaccines. Effective vaccines are very difficult to make, and the great majority failed.  But at this writing, two U.S. vaccines have succeeded, and this is a mighty good start. Now, government is investing massively in rigorous dissemination of these vaccines.

Total spending on all of education research dedicated to creating and evaluating educational innovations is a tiny fraction of what has been spent and will be spent on vaccines. But can you imagine that it is impossible to improve reading, math, science, and other outcomes, with clear goals and serious resources? Of course it could be done. A key element of “building better” could be to substantially scale up use of proven programs we have now, and to invest in new development and evaluation to make today’s best obsolete, replaced by better and better approaches. The research and evaluation of tutoring proves this could happen, and perhaps a successful rollout of tutoring will demonstrate what proven programs can do in education.

4.  Commit to Success

Education goes from fad to fad, mandate to mandate, without making much progress. In order to “build better,” we all need to commit to finding what works, disseminating it broadly, and then finding even better solutions, until all children are succeeding. This must be a long-term commitment, but if we are investing adequately and see that we are improving outcomes each year, then it is clear we can do it.            

With a change of administrations, we are going to hear a lot about hope. Hope is a good start, but it is not a plan. Let’s plan to build back better, and then for the first time in the history of education, make sure our solutions work, for all of our children.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Tutoring Could Change Everything

Starting in the 1990s, futurists and technology fans began to say, “The Internet changes everything.” And eventually, it did. The Internet has certainly changed education, although it is unclear whether these changes have improved educational effectiveness.

Unlike the Internet, tutoring has been around since hunters and gatherers taught their children to hunt and gather. Yet ancient as it is, making one-to-one or small group tutoring widely available in Title I schools could have profound impacts on the most nettlesome problems of education.

            If the National Tutoring Corps proposal I’ve been discussing in recent blogs (here , here, and here) is widely implemented and successful, it could have both obvious and not-so-obvious impacts on many critical aspects of educational policy and practice. In this blog, I’ll discuss these revolutionary and far-reaching impacts.

Direct and Most Likely Impacts

Struggling Students

            Most obviously, if the National Tutoring Corps is successful, it will be because it has had an important positive impact on the achievement of students who are struggling in reading and/or mathematics. At 100,000 tutors, we expect as many as four million low-achieving students in Title I schools will benefit, about 10% of all U.S. students in grades 1-9, but, say, 50% of the students in the lowest 20% of their grades.

Title I

            In a December 20 tweet, former Houston superintendent Terry Grier suggested: “Schools should utilize all or most of their Title I money to implement tutoring programs…to help K-2 students catch up on lost literacy skills.”

            I’d agree, except that I’d include later grades and math as well as reading if there is sufficient funding. The purpose of Title I is to accelerate the achievement of low-achieving, disadvantaged students. If schools were experienced with implementing proven tutoring programs, and knew them from their own experience to be effective and feasible, why would such programs not become the main focus of Title I funding, as Grier suggests?

Special Education

            Students with specific learning disabilities and other “high-incidence” disabilities (about half of all students in special education) are likely to benefit from structured tutoring in reading or math. If we had proven, reliable, replicable tutoring models, with which many schools will have had experience, then schools might be able to greatly reduce the need for special education for students whose only problem is difficulty in learning reading or mathematics. For students already in special education, their special education teachers may adopt proven tutoring methods themselves, and may enable students with specific learning disabilities to succeed in reading and math, and hopefully to exit special education.

Increasing the Effectiveness of Other Tutoring and Supportive Services

            Schools already have various tutoring programs, including volunteer programs. In schools involved in the National Tutoring Corps, we recommend that tutoring by paid, well-trained tutors go to the lowest achievers in each grade. If schools also have other tutoring resources, they should be concentrated on students who are below grade level, but not struggling as much as the lowest achievers. These additional tutors might use the proven effective programs provided by the National Tutoring Corps, offering a consistent and effective approach to all students who need tutoring. The same might apply to other supportive services offered by the school.

Less Obvious But Critical Impacts

A Model for Evidence-to-Practice

            The success of evidence-based tutoring could contribute to the growth of evidence-based reform more broadly. If the National Tutoring Corps is seen to be effective because of its use of already-proven instructional approaches, this same idea could be used in every part of education in which robust evidence exists. For example, education leaders might reason that if use of evidence-based tutoring approaches had a big effect on students struggling in reading and math, perhaps similar outcomes could be achieved in algebra, or creative writing, or science, or programs for English learners.

Increasing the Amount and Quality of Development and Research on Replicable Solutions to Key Problems in Education

            If the widespread application of proven tutoring models broadly improves student outcomes, then it seems likely that government, private foundations, and perhaps creators of educational materials and software might invest far more in development and research than they do now, to discover new, more effective educational programs.

Reductions in Achievement Gaps

            If it were widely accepted that there were proven and practical means of significantly improving the achievement of low achievers, then there is no excuse for allowing achievement gaps to continue. Any student performing below the mean could be given proven tutoring and should gain in achievement, reducing gaps between low and high achievers.

Improvements in Behavior and Attendance

            Many of the students who engage in disruptive behavior are those who struggle academically, and therefore see little value in appropriate behavior. The same is true of students who skip school. Tutoring may help prevent behavior and attendance problems, not just by increasing the achievement of struggling students, but also by giving them caring, personalized teaching with a tutor who forms positive relationships with them and encourages attendance and good behavior.

Enhancing the Learning Environment for Students Who Do Not Need Tutoring

            It is likely that a highly successful tutoring initiative for struggling students could enhance the learning environment for the schoolmates of these students who do not need tutoring. This would happen if the tutored students were better behaved and more at peace with themselves, and if teachers did not have to struggle to accommodate a great deal of diversity in achievement levels within each class.

            Of course, all of these predictions depend on Congress funding a national tutoring plan based on the use of proven programs, and on implementation at scale actually producing the positive impacts that they have so often shown in research. But I hope these predictions will help policy makers and educational leaders realize the potential positive impacts a tutoring initiative could have, and then do what they can to make sure that the tutoring programs are effectively implemented and produce their desired impact. Then, and only then, will tutoring truly change everything.

Clarification:

Last week’s blog, on the affordability of tutoring, stated that a study of Saga Math, in which there was a per-pupil cost of $3,600, was intended as a demonstration, and was not intended to be broadly replicable.  However, all I meant to say is that Saga was never intended to be replicated AT THAT PRICE PER STUDENT.  In fact, a much lower-cost version of Saga Math is currently being replicated.  I apologize if I caused any confusion.

Photo credit: Deeper Learning 4 All, (CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Large-Scale Tutoring Could Fail. Here’s How to Ensure It Does Not.

I’m delighted to see that the idea of large-scale tutoring to combat Covid-19 losses has gotten so important in the policy world that it is attracting scoffers and doubters. Michael Goldstein and Bowen Paulle (2020) published five brief commentaries recently in The Gadfly, warning about how tutoring could fail, both questioning the underlying research on tutoring outcomes (maybe just publication bias?) and noting the difficulties of rapid scale up. They also quote without citation a comment by Andy Rotherham, who quite correctly notes past disasters when government has tried and failed to scale up promising strategies: “Ed tech, class size reduction, teacher evaluations, some reading initiatives, and charter schools.” To these, I would add many others, but perhaps most importantly Supplementary Educational Services (SES), a massive attempt to implement all sorts of after school and summer school programs in high-poverty, low-achieving schools, which had near-zero impact, on average.

So if you were feeling complacent that the next hot thing, tutoring, was sure to work, no matter how it’s done, then you have not been paying attention for the past 30 years.

But rather than argue with these observations, I’d like to explain that the plan I’ve proposed, which you will find here, is fundamentally different from any of these past efforts, and if implemented as designed, with adequate funding, is highly likely to work at scale.

1.  Unlike all of the initiatives Rotherham dismisses, unlike SES, unlike just about everything ever used at scale in educational policy, the evidence base for certain specific, well-evaluated programs is solid.  And in our plan, only the proven programs would be scaled.

A little known but crucial fact: Not all tutoring programs work. The details matter. Our recent reviews of research on programs for struggling readers (Neitzel et al., in press) and math (Pellegrini et al., in press) identify individual tutoring programs that do and do not work, as well as types of tutoring that work well and those that do not.

Our scale-up plan would begin with programs that already have solid evidence of effectiveness, but it would also provide funding and third-party, rigorous evaluations of scaled-up programs without sufficient evidence, as well as new programs, designed to add additional options for schools. New and insufficiently evaluated programs would be piloted and implemented for evaluation, but they would not be scaled up unless they have solid evidence of effectiveness in randomized evaluations.

If possible, in fact, we would hope to re-evaluate even the most successful evaluated programs, to make sure they work.

If we stick to repeatedly-proven programs, rigorously evaluated in large randomized experiments, then who cares whether other programs have failed in the past? We will know that the programs being used at scale do work. Also, all this research would add greatly to knowledge about effective and ineffective program components and applications to particular groups of students, so over time, we’d expect the individual programs, and the field as a whole, to gain in the ability to provide proven tutoring approaches at scale.

2.  Scale-up of proven programs can work if we take it seriously. It is true that scale-up has many pitfalls, but I would argue that when scale-up does not occur it is for one of two reasons. First, the programs being scaled were not adequately proven in the first place. Second, the funding provided for scale-up was not sufficient to allow the program developers to scale up under the conditions they know full well are necessary. As examples of the latter, programs that provided well-trained and experienced trainers in their initial studies are often forced by insufficient funding to use trainer-of-trainers models for greatly diminished amounts of training in scale-up. As a result, the programs that worked at small scale failed in large-scale replication. This happens all the time, and this is what makes policy experts conclude that nothing works at scale.

However, the lesson they should have learned instead is just that programs proven to work at small scale can succeed if the key factors that made them work at small scale are implemented with fidelity at large scale. If anything less is done in scale-up, you’re taking big risks.

If well-trained trainers are essential, then it is critical to insist on well-trained trainers. If a certain amount or quality of training is essential, it is critical to insist on it, and make sure it happens in every school using a given program. And so on. There is no reason to skimp on the proven recipe.

But aren’t all these trainers and training days and other elements unsustainable?  This is the wrong question. The right one is, how can we make tutoring as effective as possible, to justify its cost?

Tutoring is expensive, but most of the cost is in the salaries of the tutors themselves. As an analogy, consider horse racing.  Horse owners pay millions for horses with great potential. Having done so, do you think they skimp on trainers or training? Of course not. In the same way, a hundred teaching assistants tutors cost roughly $4 million per year in salaries and benefits alone. Let’s say top-quality training for this group costs $500,000 per year, while crummy training costs $50,000. If these figures are in the ballpark, would it be wise to spend $4,500,000 on a terrific tutoring program, or $4,050,000 on a crummy one?

Successful scale-up takes place all the time in business. How does Starbucks make sure your experience in every single store is excellent? Simple. They have well-researched, well specified, obsessively monitored standards and quality metrics for every part of their operation. Scale-up in education can work just the same way, and in comparison to the costs of front-line personnel, the costs of great are trivially greater than the cost of crummy.

3.  Ongoing research will, in our proposal, formatively evaluate the entire tutoring effort over time, and development and evaluation will continually add new proven programs.  

Ordinarily, big federal education programs start with all kinds of rules and regulations and funding schemes, and these are announced with a lot of hoopla and local and national meetings to explain the new programs to local educators and leaders. Some sort of monitoring and compliance mechanism is put in place, but otherwise the program steams ahead. Several years later, some big research firm gets a huge contract to evaluate the program. On average, the result is almost always disappointing. Then there’s a political fight about just how disappointing the results are, and life goes on.

 The program we have proposed is completely different. First, as noted earlier, the individual programs that are operating at large scale will all be proven effective to begin with, and may be evaluated and proven effective again, using the same methods as those used to validate new programs. Second, new proven programs would be identified and scaled up all the time. Third, numerous studies combining observations, correlational studies, and mini-experiments would be evaluating program variations and impacts with different populations and circumstances, adding knowledge of what is happening at the chalkface and of how and why outcomes vary. This explanatory research would not be designed to decide which programs work and which do not (that would be done in the big randomized studies), but to learn from practice how to improve outcomes for each type of school and application. The idea is to get smarter over time about how to make tutoring as effective as it can be, so when the huge summative evaluation takes place, there will be no surprises. We would already know what is working, and how, and why.

Our National Tutoring Corps proposal is not a big research project, or a jobs program for researchers. The overwhelming focus is on providing struggling students the best tutoring we know how to provide. But using a small proportion of the total allocation would enable us to find out what works, rapidly enough to inform practice. If this were all to happen, we would know more and be able to do more every year, serving more and more struggling students with better and better programs.

So rather than spending a lot of taxpayer money and hoping for the best, we’d make scale-up successful by using evidence at the beginning, middle, and end of the process, to make sure that this time, we really know what we are doing. We would make sure that effective programs remain successful at scale, rather than merely hoping they will.

References

Goldstein, M., & Paulle, B. (2020, Dec. 8) Vaccine-making’s lessons for high-dosage tutoring, Part 1. The Gadfly.

Goldstein, M., & Paulle, B. (2020, Dec. 11). Vaccine-making’s lessons for high-dosage tutoring, Part IV. The Gadfly.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (in press). Effective programs in elementary mathematics: A best-evidence synthesis. AERA Open.

Original photo by Catherine Carusso, Presidio of Monterey Public Affairs

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

The Details Matter. That’s Why Proven Tutoring Programs Work Better than General Guidelines.

When I was in first grade, my beloved teacher, Mrs. Adelson, introduced a new activity. She called it “phonics.”  In “phonics,” we were given tiny pieces of paper with letters on them to paste onto a piece of paper, to make words. It was a nightmare. Being a boy, I could sooner sprout wings and fly than do this activity without smearing paste and ink all over the place. The little slips of paper stuck to my thumb rather than to the paper. This activity taught me no phonics or reading whatsoever, but did engender a longtime hatred of “phonics,” as I understood it.

Much, much later I learned that phonics was essential in beginning reading, so I got over my phonics phobia. And I learned an important lesson. Even if an activity focuses on an essential skill, this does not mean that just any activity with that focus will work. The details matter.

I’ve had reason to reflect on this early lesson many times recently, as I’ve spoken to various audiences about our National Tutoring Corps plan. Often, people will ask why it is important to use specific proven programs. Why not figure out the characteristics of proven programs, and encourage tutors to use those consensus strategies?

The answer is that because the details matter, tutoring according to agreed-upon practices is not going to be as effective as specific proven programs, on average. Mrs. Adelson had a correct understanding of the importance of phonics in beginning reading, but in the classroom, where the paste hits the page, her phonics strategy was awful. In tutoring, we might come to agreement about factors such as group size, qualifications of tutors, amount of PD, and so on, but dozens of details also have to be right. An effective tutoring program has to get right crucial features, such as the nature and quality of tutor training and coaching, student materials and software, instructional strategies, feedback and correction strategies when students make errors, frequency and nature of assessments, means of motivating and recognizing student progress, means of handling student absences, links between tutors and teachers and between tutors and parents, and much more. Getting any of these strategies wrong could greatly diminish the effectiveness of tutoring.

The fact that a proven program has shown positive outcomes in rigorous experiments supports confidence that the program’s particular constellation of strategies is effective. During any program’s development and piloting, developers have had to experiment with solutions to each of the key elements. They have had many opportunities to observe tutoring sessions, to speak with tutors, to look at formative data, and to decide on specific strategies for each of the problems that must be solved. A teacher or local professional developer has not had the opportunity to try out and evaluate specific components, so even if they have an excellent understanding of the main elements of tutoring, they could use or promote key components that are not effective or may even be counterproductive. There are now many practical, ready-to-implement, rigorously evaluated tutoring programs with positive impacts (Neitzel et al., in press). Why should we be using programs whose effects are unknown, when there are many proven alternatives?

Specificity is of particular importance in small-group tutoring, because very effective small group methods superficially resemble much less effective methods (see Borman et al., 2001; Neitzel et al., in press; Pellegrini et al., 2020). For example, one-to-four tutoring might look like traditional Title I pullouts, which are far less effective. Some “tutors” teach a class of four no differently than they would teach a class of thirty. Tutoring methods that incorporate computers may also superficially resemble computer assisted instruction, which is also far less effective. Tutoring derives its unique effectiveness from the ability of the tutor to personalize instruction for each child, to provide unique feedback to the specific problems each student faces. It also depends on close relationships between tutors and students. If the specifics are not carefully trained and implemented with understanding and spirit, small-group tutoring can descend into business-as-usual. Not that ordinary teaching and CAI are ineffective, but to successfully combat the effects of Covid-19 school closures and learning gaps in general, tutoring must be much more effective than similar-looking methods. And it can be, but only if tutors are trained and equipped to provide tutoring that has been proven to be effective.

Individual tutors can and do adapt tutoring strategies to meet the needs of particular students or subgroups, and this is fine if the tutor is starting from a well-specified and proven, comprehensive tutoring program and making modifications for well-justified reasons. But when tutors are expected to substantially invent or interpret general strategies, they may make changes that diminish program effectiveness. All too often, local educators seek to modify proven programs to make them easier to implement, less expensive, or more appealing to various stakeholders, but these modifications may leave out elements essential to program effectiveness.

The national experience of Supplementary Educational Services illustrates how good ideas without an evidence base can go wrong. SES provided mostly after-school programs of all sorts, including various forms of tutoring. But hardly any of these programs had evidence of effectiveness. A review of outcomes of almost 400 local SES grants found reading and math effect sizes near zero, on average (Chappell et al., 2011).

In tutoring, it is essential that every student receiving tutoring gets a program highly likely to measurably improve the student’s reading or mathematics skills. Tutoring is expensive, and tutoring is mostly used with students who are very much at risk. It is critical that we give every tutor and every student the highest possible probability of life-altering improvement. Proven, replicable, well-specified programs are the best way to ensure positive outcomes.

Mrs. Adelson was right about phonics, but wrong about how to teach it. Let’s not make the same mistake with tutoring.

References

Borman, G., Stringfield, S., & Slavin, R.E. (Eds.) (2001).  Title I: Compensatory education at the crossroads.  Mahwah, NJ: Erlbaum.

Chappell, S., Nunnery, J., Pribesh, S., & Hager, J. (2011). A meta-analysis of Supplemental Educational Services (SES) provider effects on student achievement. Journal of Education for Students Placed at Risk, 16(1), 1-23.

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (in press). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly.

Pellegrini, M., Neitzel, A., Lake, C., & Slavin, R. (2020). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

Photo by Austrian National Library on Unsplash

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How to Make Evidence in Education Make a Difference

By Robert Slavin

I have a vision of how education in the U.S. and the world will begin to make solid, irreversible progress in student achievement. In this vision, school leaders will constantly be looking for the most effective programs, proven in rigorous research to accelerate student achievement. This process of informed selection will be aided by government, which will provide special incentive funds to help schools implement proven programs.

In this imagined future, the fact that schools are selecting programs based on good evidence means that publishers, software companies, professional development companies, researchers, and program developers, as well as government, will be engaged in a constant process of creating, evaluating, and disseminating new approaches to every subject and grade level. As in medicine, developers and researchers will be held to strict standards of evidence, but if they develop programs that meet these high standards, they can be confident that their programs will be widely adopted, and will truly make a difference in student learning.

Discovering and disseminating effective classroom programs is not all we have to get right in education. For example, we also need great teachers, principals, and other staff who are well prepared and effectively deployed. A focus on evidence could help at every step of that process, of course, but improving programs and improving staff are not an either-or proposition. We can and must do both. If medicine, for example, focused only on getting the best doctors, nurses, technicians, other staff, but medical research and dissemination of proven therapies were underfunded and little heeded, then we’d have great staff prescribing ineffective or possibly harmful medicines and procedures. In agriculture, we could try to attract farmers who are outstanding in their fields, but that would not have created the agricultural revolution that has largely solved the problem of hunger in most parts of the world. Instead, decades of research created or identified improvements in seeds, stock, fertilizers, veterinary practices, farming methods, and so on, for all of those outstanding farmers to put into practice.

Back to education, my vision of evidence-based reform depends on many actions. Because of the central role government plays in public education, government must take the lead. Some of this will cost money, but it would be a tiny proportion of the roughly $600 billion we spend on K-12 education annually, at all levels (federal, state, and local). Other actions would cost little or nothing, focusing only on standards for how existing funds are used. Key actions to establish evidence of impact as central to educational decisions are as follows:

  1. Invest substantially in practical, replicable approaches to improving outcomes for students, especially achievement outcomes.

Rigorous, high-quality evidence of effectiveness for educational programs has been appearing since about 2006 at a faster rate than ever before, due in particular to investments by the Institute for Education Sciences (IES), Investing in Innovation/Education Innovation Research (i3/EIR), and the National Science Foundation (NSF) in the U.S., and the Education Endowment Foundation in England, but also other parts of government and private foundations. All have embraced rigorous evaluations involving random assignment to conditions, appropriate measures independent of developers or researchers, and at the higher funding levels, third-party evaluators. These are very important developments, and they have given the research field, educators, and policy makers excellent reasons for confidence that the findings of such research have direct meaning for practice. One problem is that, as is true in every applied field that embraces rigorous research, most experiments do not find positive impacts. Only about 20% of such experiments do find positive outcomes. The solution to this is to learn from successes and failures, so that our success rate improves over time. We also need to support a much larger enterprise of development of new solutions to enduring problems of education, in all subjects and grade levels, and to continue to support rigorous evaluations of the most promising of these innovations. In other words, we should not be daunted by the fact that most evaluations do not find positive impacts, but instead we need to increase the success rate by learning from our own evidence, and to carry out many more experiments. Even 20% of a very big number is a big number.

2. Improve communications of research findings to researchers, educators, policy makers, and the general public.

Evidence will not make a substantial difference in education until key stakeholders see it as a key to improving students’ success. Improving communications certainly includes making it easy for various audiences to find out which programs and practices are truly effective. But we also need to build excitement about evidence. To do this, government might establish large-scale, widely publicized, certain-to-work demonstrations of the use and outcomes of proven approaches, so that all will see how evidence can lead to meaningful change.

I will be writing more on in depth on this topic in future blogs.

3. Set specific standards of evidence, and provide incentive funding for schools to adopt and implement proven practices.

The Every Student Succeeds Act (ESSA) boldly defined “strong,” “moderate,” “promising,” and lower levels of evidence of effectiveness for educational programs, and required use of programs meeting one of these top categories for certain federal funding, especially school improvement funding for low-achieving schools. This certainly increased educators’ interest in evidence, but in practice, it is unclear how much this changed practice or outcomes. These standards need to be made more specific. In addition, the standards need to be applied to funding that is clearly discretionary, to help schools adopt new programs, not to add new evidence requirements to traditional funding sources. The ESSA evidence standards have had less impact than hoped for because they mainly apply to school improvement, a longstanding source of federal funding. As a result, many districts and states have fought hard to have the programs they already have declared “effective,” regardless of their actual evidence base. To make evidence popular, it is important to make proven programs available as something extra, a gift to schools and children rather than a hurdle to continuing existing programs. In coming blogs I’ll write further about how government could greatly accelerate and intensify the process of development, evaluation, communication, and dissemination, so that the entire process can begin to make undeniable improvements in particular areas of critical importance demonstrating how evidence can make a difference for students.

Photo credit: Deeper Learning 4 All/(CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How Can You Tell When The Findings of a Meta-Analysis Are Likely to Be Valid?

In Baltimore, Faidley’s, founded in 1886, is a much loved seafood market inside Lexington Market. Faidley’s used to be a real old-fashioned market, with sawdust on the floor and an oyster bar in the center. People lined up behind their favorite oyster shucker. In a longstanding tradition, the oyster shuckers picked oysters out of crushed ice and tapped them with their oyster knives. If they sounded full, they opened them. But if they did not, the shuckers discarded them.

I always noticed that the line was longer behind the shucker who was discarding the most oysters. Why? Because everyone knew that the shucker who was pickier was more likely to come up with a dozen fat, delicious oysters, instead of say, nine great ones and three…not so great.

I bring this up today to tell you how to pick full, fair meta-analyses on educational programs. No, you can’t tap them with an oyster knife, but otherwise, the process is similar. You want meta-analysts who are picky about what goes into their meta-analyses. Your goal is to make sure that a meta-analysis produces results that truly represent what teachers and schools are likely to see in practice when they thoughtfully implement an innovative program. If instead you pick the meta-analysis with the biggest effect sizes, you will always be disappointed.

As a special service to my readers, I’m going to let you in on a few trade secrets about how to quickly evaluate a meta-analysis in education.

One very easy way to evaluate a meta-analysis is to look at the overall effect size, probably shown in the abstract. If the overall mean effect size is more than about +0.40, you probably don’t have to read any further. Unless the treatment is tutoring or some other treatment that you would expect to make a massive difference in student achievement, it is rare to find a single legitimate study with an effect size that large, much less an average that large. A very large effect size is almost a guarantee that a meta-analysis is full of studies with design features that greatly inflate effect sizes, not studies with outstandingly effective treatments.

Next, go to the Methods section, which will have within it a section on inclusion (or selection) criteria. It should list the types of studies that were or were not accepted into the study. Some of the criteria will have to do with the focus of the meta-analysis, specifying, for example, “studies of science programs for students in grades 6 to 12.” But your focus is on the criteria that specify how picky the meta-analysis is. As one example of a picky set of critera, here are the main ones we use in Evidence for ESSA and in every analysis we write:

  1. Studies had to use random assignment or matching to assign students to experimental or control groups, with schools and students in each specified in advance.
  2. Students assigned to the experimental group had to be compared to very similar students in a control group, which uses business-as-usual. The experimental and control students must be well matched, within a quarter standard deviation at pretest (ES=+0.25), and attrition (loss of subjects) must be no more than 15% higher in one group than the other at the end of the study. Why? It is essential that experimental and control groups start and remain the same in all ways other than the treatment. Controls for initial differences do not work well when the differences are large.
  3. There must be at least 30 experimental and 30 control students. Analyses of combined effect sizes must control for sample sizes. Why? Evidence finds substantial inflation of effect sizes in very small studies.
  4. The treatments must be provided for at least 12 weeks. Why? Evidence finds major inflation of effect sizes in very brief studies, and brief studies do not represent the reality of the classroom.
  5. Outcome measures must be measures independent of the program developers and researchers. Usually, this means using national tests of achievement, though not necessarily standardized tests. Why? Research has found that tests made by researchers can inflate effect sizes by double, or more, and research-made measures do not represent the reality of classroom assessment.

There may be other details, but these are the most important. Note that there is a double focus of these standards. Each is intended both to minimize bias, but also to maximize similarity to the conditions faced by schools. What principal or teacher who cares about evidence would be interested in adopting a program evaluated in comparison to a very different control group? Or in a study with few subjects, or a very brief duration? Or in a study that used measures made by the developers or researchers? This set is very similar to what the What Works Clearinghouse (WWC) requires, except #5 (the WWC requires exclusion of “overaligned” measures, but not developer-/researcher-made measures).

If these criteria are all there in the “Inclusion Standards,” chances are you are looking at a top-quality meta-analysis. As a rule, it will have average effect sizes lower than those you’ll see in reviews without some or all of these standards, but the effect sizes you see will probably be close to what you will actually get in student achievement gains if your school implements a given program with fidelity and thoughtfulness.

What I find astonishing is how many meta-analyses do not have standards this high. Among experts, these criteria are not controversial, except for the last one, which shouldn’t be. Yet meta-analyses are often written, and accepted by journals, with much lower standards, thereby producing greatly inflated, unrealistic effect sizes.

As one example, there was a meta-analysis of Direct Instruction programs in reading, mathematics, and language, published in the Review of Educational Research (Stockard et al., 2016). I have great respect for Direct Instruction, which has been doing good work for many years. But this meta-analysis was very disturbing.

The inclusion and exclusion criteria in this meta-analysis did not require experimental-control comparisons, did not require well-matched samples, and did not require any minimum sample size or duration. It was not clear how many of the outcomes measures were made by program developers or researchers, rather than independent of the program.

With these minimal inclusion standards, and a very long time span (back to 1966), it is not surprising that the review found a great many qualifying studies. 528, to be exact. The review also reported extraordinary effect sizes: +0.51 for reading, +0.55 for math, and +0.54 for language. If these effects were all true and meaningful, it would mean that DI is much more effective than one-to-one tutoring, for example.

But don’t get your hopes up. The article included an online appendix that showed the sample sizes, study designs, and outcomes of every study.

First, the authors identified eight experimental designs (plus single-subject designs, which were treated separately). Only two of these would meet anyone’s modern standards of meta-analysis: randomized and matched. The others included pre-post gains (no control group), comparisons to test norms, and other pre-scientific designs.

Sample sizes were often extremely small. Leaving aside single-case experiments, there were dozens of single-digit sample sizes (e.g., six students), often with very large effect sizes. Further, there was no indication of study duration.

What is truly astonishing is that RER accepted this study. RER is the top-rated journal in all of education, based on its citation count. Yet this review, and the Kulik & Fletcher (2016) review I cited in a recent blog, clearly did not meet minimal standards for meta-analyses.

My colleagues and I will be working in the coming months to better understand what has gone wrong with meta-analysis in education, and to propose solutions. Of course, our first step will be to spend a lot of time at oyster bars studying how they set such high standards. Oysters and beer will definitely be involved!

Photo credit: Annette White / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

References

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.

Stockard, J., Wood, T. W., Coughlin, C., & Rasplica Khoury, C. (2018). The effectiveness of Direct Instruction curricula: A meta-analysis of a half century of research. Review of Educational Research88(4), 479–507. https://doi.org/10.3102/0034654317751919

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org