In Meta-Analyses, Weak Inclusion Standards Lead to Misleading Conclusions. Here’s Proof.

By Robert Slavin and Amanda Neitzel, Johns Hopkins University

In two recent blogs (here and here), I’ve written about Baltimore’s culinary glories: crabs and oysters. My point was just that in both cases, there is a lot you have to discard to get to what matters. But I was of course just setting the stage for a problem that is deadly serious, at least to anyone concerned with evidence-based reform in education.

Meta-analysis has contributed a great deal to educational research and reform, helping readers find out about the broad state of the evidence on practical approaches to instruction and school and classroom organization. Recent methodological developments in meta-analysis and meta-regression, and promotion of the use of these methods by agencies such as IES and NSF, have expanded awareness and use of modern methods.

Yet looking at large numbers of meta-analyses published over the past five years, even up to the present, the quality is highly uneven. That’s putting it nicely.  The problem is that most meta-analyses in education are far too unselective with regards to the methodological quality of the studies they include. Actually, I’ve been ranting about this for many years, and along with colleagues, have published several articles on it (e.g., Cheung & Slavin, 2016; Slavin & Madden, 2011; Wolf et al., 2020). But clearly, my colleagues and I are not making enough of a difference.

My colleague, Amanda Neitzel, and I thought of a simple way we could communicate the enormous difference it makes if a meta-analysis accepts studies that contain design elements known to inflate effect sizes. In this blog, we once again use the Kulik & Fletcher (2016) meta-analysis of research on computerized intelligent tutoring, which I critiqued in my blog a few weeks ago (here). As you may recall, the only methodological inclusion standards used by Kulik & Fletcher required that studies use RCTs or QEDs, and that they have a duration of at least 30 minutes (!!!). However, they included enough information to allow us to determine the effect sizes that would have resulted if they had a) weighted for sample size in computing means, which they did not, and b) excluded studies with various features known to inflate effect size estimates. Here is a table summarizing our findings when we additionally excluded studies containing procedures known to inflate mean effect sizes:

If you follow meta-analyses, this table should be shocking. It starts out with 50 studies and a very large effect size, ES=+0.65. Just weighting the mean for study sample sizes reduces this to +0.56. Eliminating small studies (n<60) cut the number of studies almost in half (n=27) and cut the effect size to +0.39. But the largest reductions are due to excluding “local” measures, which on inspection are always measures made by developers or researchers themselves. (The alternative was “standardized measures.”) By itself, excluding local measures (and weighting) cut the number of included studies to 12, and the effect size to +0.10, which was not significantly different from zero (p=.17). Excluding small, brief, and “local” measures only slightly changes the results, because both small and brief studies almost always use “local” (i.e., researcher-made) measures. Excluding all three, and weighting for sample size, leaves this review with only nine studies and an effect size of +0.09, which is not significantly different from zero (p=.21).

The estimates at the bottom of the chart represent what we call “selective standards.” These are the standards we apply in every meta-analysis we write (see www.bestevidence.org), and in Evidence for ESSA (www.evidenceforessa.org).

It is easy to see why this matters. Selective standards almost always produce much lower estimates of effect sizes than do reviews with much less selective standards, which therefore include studies containing design features that have a strong positive bias on effect sizes. Consider how this affects mean effect sizes in meta-analyses. For example, imagine a study that uses two measures of achievement. One is a measure made by the researcher or developer specifically to be “sensitive” to the program’s outcomes. The other is a test independent of the program, such as GRADE/GMADE or Woodcock, standardized tests but not necessarily state tests. Imagine that the researcher-made measure obtains an effect size of +0.30, while the independent measure has an effect size of +0.10. A less-selective meta-analysis would report a mean effect size of +0.20, a respectable-sounding impact. But a selective meta-analysis would report an effect size of +0.10, a very small impact. Which of these estimates represents an outcome with meaning for practice? Clearly, school leaders should not value the +0.30 or +0.20 estimates, which require use of a test designed to be “sensitive” to the treatment. They should care about the gains on the independent test, which represents what educators are trying to achieve and what they are held accountable for. The information from the researcher-made test may be valuable to the researchers, but it has little or no value to educators or students.

The point of this exercise is to illustrate that in meta-analyses, choices of methodological exclusions may entirely determine the outcomes. Had they chosen other exclusions, the Kulik & Fletcher meta-analysis could have reported any effect size from +0.09 (n.s.) to +0.65 (p<.000).

The importance of these exclusions is not merely academic. Think how you’d explain the chart above to your sister the principal:

            Principal Sis: I’m thinking of using one of those intelligent tutoring programs to improve achievement in our math classes. What do you suggest?

            You:  Well, it all depends. I saw a review of this in the top journal in education research. It says that if you include very small studies, very brief studies, and studies in which the researchers made the measures, you could have an effect size of +0.65! That’s like seven additional months of learning!

            Principal Sis:  I like those numbers! But why would I care about small or brief studies, or measures made by researchers? I have 500 kids, we teach all year, and our kids have to pass tests that we don’t get to make up!

            You (sheepishly):  I guess you’re right, Sis. Well, if you just look at the studies with large numbers of students, which continued for more than 12 weeks, and which used independent measures, the effect size was only +0.09, and that wasn’t even statistically significant.

            Principal Sis:  Oh. In that case, what kinds of programs should we use?

From a practical standpoint, study features such as small samples or researcher-made measures add a lot to effect sizes while adding nothing to the value to students or schools of the programs or practices they want to know about. They just add a lot of bias. It’s like trying to convince someone that corn on the cob is a lot more valuable than corn off the cob, because you get so much more quantity (by weight or volume) for the same money with corn on the cob.     Most published meta-analyses only require that studies have control groups, and some do not even require that much. Few exclude researcher- or developer-made measures, or very small or brief studies. The result is that effect sizes in published meta-analyses are very often implausibly large.

Meta-analyses that include studies lacking control groups or studies with small samples, brief durations, pretest differences, or researcher-made measures report overall effect sizes that cannot be fairly compared to other meta-analyses that excluded such studies. If outcomes do not depend on the power of the particular program but rather on the number of potentially biasing features they did or did not exclude, then outcomes of meta-analyses are meaningless.

It is important to note that these two examples are not at all atypical. As we have begun to look systematically at published meta-analyses, most of them fail to exclude or control for key methodological factors known to contribute a great deal of bias. Something very serious has to be done to change this. Also, I’d remind readers that there are lots of programs that do meet strict standards and show positive effects based on reality, not on including biasing factors. At www.evidenceforessa.org, you can see more than 120 reading and math programs that meet selective standards for positive impacts. The problem is that in meta-analyses that include studies containing biasing factors, these truly effective programs are swamped by a blizzard of bias.

In my recent blog (here) I proposed a common set of methodological inclusion criteria that I would think most methodologists would agree to.  If these (or a similar consensus list) were consistently used, we could make more valid comparisons both within and between meta-analyses. But as long as inclusion criteria remain highly variable from meta-analysis to meta-analysis, then all we can do is pick out the few that do use selective standards, and ignore the rest. What a terrible waste.

References

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45 (5), 283-292.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.

Slavin, R. E., Madden, N. A. (2011). Measures inherent to treatments in program effectiveness reviews. Journal of Research on Educational Effectiveness, 4, 370–380.

Wolf, R., Morrison, J.M., Inns, A., Slavin, R. E., & Risman, K. (2020). Average effect sizes in developer-commissioned and independent evaluations. Journal of Research on Educational Effectiveness. DOI: 10.1080/19345747.2020.1726537

Photo credit: Deeper Learning 4 All, (CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

A “Called Shot” for Educational Research and Impact

In the 1932 World Series, Babe Ruth stepped up to the plate and pointed to the center field fence. Everyone there understood: He was promising to hit the next pitch over the fence.

And then he did.

That one home run established Babe Ruth as the greatest baseball player ever. Even though several others have long since beaten his record of 60 home runs, no one else ever promised to hit a home run and then did it.

Educational research needs to execute a “called shot” of its own. We need to identify a clear problem, one that must be solved with some urgency, one that every citizen understands and cares about, one that government is willing and able to spend serious money to solve. And then we need to solve it, in a way that is obvious to all. I think the clear need for intensive services for students whose educations have suffered due to Covid-19 school closures provides an opportunity for our own “called shot.”

In my recent Open Letter to President-Elect Biden, I described a plan to provide up to 300,000 well-trained college-graduate tutors to work with up to 12 million students whose learning has been devastated by the Covid-19 school closures, or who are far below grade level for any reason. There are excellent reasons to do this, including making a rapid difference in the reading and mathematics achievement of vulnerable children, providing jobs to hundreds of thousands of college graduates who may otherwise be unemployed, and starting the best of these non-certified tutors on a path to teacher certification. These reasons more than justify the effort. But in today’s blog, I wanted to explain a fourth rationale, one that in the long run may be the most important of all.

A major tutoring enterprise, entirely focusing on high-quality implementation of proven programs, could be the “called shot” evidence-based education needs to establish its value to the American public.

Of course, the response to the Covid-19 pandemic is already supporting a “called shot” in medicine, the rush to produce a vaccine. At this time we do not know what the outcome will be, but throughout the world, people are closely following the progress of dozens of prominent attempts to create a safe and effective vaccine to prevent Covid-19. If this works as hoped, this will provide enormous benefits for entire populations and economies worldwide. But it could also raise the possibility that we can solve many crucial medical problems much faster than we have in the past, without compromising on strict research standards. The funding of many promising alternatives, and rigorous testing of each before they are disseminated, is very similar to what I and my colleagues have proposed for various approaches to tutoring. In both the medical case and the educational case, the size of the problem justifies this intensive, all-in approach. If all goes well with the vaccines, that will be a “called shot” for medicine, but medicine has long since proven its capability to use science to solve big problems. Curing polio, eliminating smallpox, and preventing measles come to mind as examples. In education, we need to earn this confidence, with a “called shot” of our own.

Think of it. Education researchers and leaders who support them would describe a detailed and plausible plan to solve a pressing problem of education. Then we announce that given X amount of money and Y amount of time, we will demonstrate that struggling students can perform substantially better than they would have without tutoring.

We’d know this would work, because part of the process would be identifying a) programs already proven to be effective, b) programs that already exist at some scale that would be successfully evaluated, and c) newly-designed programs that would successfully be evaluated. In each case, programs would have to meet rigorous evaluation standards before qualifying for substantial scale-up. In addition, in order to obtain funding to hire tutors, schools would have to agree to ensure that tutors use the programs with an amount and quality of training, coaching, and support at least as good as what was provided in the successful studies.

Researchers and policy makers who believe in evidence-based reform could confidently predict substantial gains, and then make good on their promises. No intervention in all of education is as effective as tutoring. Tutoring can be expensive, but it does not require a lengthy, uncertain transformation of the entire school. No sensible researcher or reformer would think that tutoring is all schools should do to improve student outcomes, but tutoring should be one element of any comprehensive plan to improve schools, and it happens to respond to the needs of post-Covid education for something that can have a dramatic, relatively quick, and relatively reliable impact.

If all went well in a large-scale tutoring intervention, the entire field of research could gain new respect, a belief among educators and the public that outcomes could be made much better than they are now by systematic applications of research, development, evaluation, and dissemination.

It is important to note that in order to be perceived to work, the tutoring “called shot” need not be proven effective across the board. By my count, there are 18 elementary reading tutoring programs with positive outcomes in randomized evaluations (see below). Let’s say 12 of them are ready for prime time and are put to the test, and 5 of those work very well at scale. That would be a tremendous success, because if we know which five approaches worked, we could make substantial progress on the problem of elementary reading failure. Just as with Covid-19 vaccines, we shouldn’t care how many vaccines failed. All that matters is that one or more of them succeeds, and can then be widely replicated.

I think it is time to do something bold to capture people’s imaginations. Let’s (figuratively) point to the center field fence, and (figuratively) hit the next pitch over it. The conditions today for such an effort are as good as they will ever be, because of universal understanding that the Covid-19 school closures deserve extraordinary investments in proven strategies. Researchers working closely with educators and political leaders can make a huge difference. We just have to make our case and insist on nothing less than whatever it takes. If a “called shot” works for tutoring, perhaps we could use similar approaches to solve other enduring problems of education.

It worked for the Babe. It should work for us, too, with much greater consequences for our children and our society than a mere home run.

*  *  *

Note: A reader of my previous blog asked what specific tutoring programs are proven effective, according to our standards. I’ve listed below reading and math tutoring programs that meet our standards of evidence. I cannot guarantee that all of these programs would be able to go to scale. We are communicating with program providers to try to assess each program’s capacity and interest in going to scale. But these programs are a good place to start in understanding where things stand today.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

An Open Letter To President-Elect Biden: A Tutoring Marshall Plan To Heal Our Students

Dear President-Elect Biden:

            Congratulations on your victory in the recent election. Your task is daunting; so much needs to be set right. I am writing to you about what I believe needs to be done in education to heal the damage done to so many children who missed school due to Covid-19 closures.

            I am aware that there are many basic things that must be done to improve schools, which have to continue to make their facilities safe for students and cope with the physical and emotional trauma that so many have experienced. Schools will be opening into a recession, so just providing ordinary services will be a challenge. Funding to enable schools to fulfill their core functions is essential, but it is not sufficient.

            Returning schools to the way they were when they closed last spring will not heal the damage students have sustained to their educational progress. This damage will be greatest to disadvantaged students in high-poverty schools, most of whom were unable to take advantage of the remote learning most schools provided. Some of these students were struggling even before schools closed, but when they re-open, millions of students will be far behind.

            Our research center at Johns Hopkins University studies the evidence on programs of all kinds for students who are at risk, especially in reading (Neitzel et al., 2020) and mathematics (Pellegrini et al., 2020). What we and many other researchers have found is that the most effective strategy for struggling students, especially in elementary schools, is one-to-one or one-to-small group tutoring. Structured tutoring programs can make a large difference in a short time, exactly what is needed to help students quickly catch up with grade level expectations.

A Tutoring Marshall Plan

            My colleagues and I have proposed a massive effort designed to provide proven tutoring services to the millions of students who desperately need it. Our proposal, based on a similar idea by Senator Coons (D-Del), would ultimately provide funding to enable as many as 300,000 tutors to be recruited, trained in proven tutoring models, and coached to ensure their effectiveness. These tutors would be required to have a college degree, but not necessarily a teaching certificate. Research has found that such tutors, using proven tutoring models with excellent professional development, can improve the achievement of students struggling in reading or mathematics as much as can teachers serving as tutors.

            The plan we are proposing is a bit like the Marshall Plan after World War II, which provided substantial funding to Western European nations devastated by the war. The idea was to put these countries on their feet quickly and effectively so that within a brief period of years, they could support themselves. In a similar fashion, a Tutoring Marshall Plan would provide intensive funding to enable Title I schools nationwide to substantially advance the achievement of their students who suffered mightily from Covid-19 school closures and related trauma. Effective tutoring is likely to enable these children to advance to the point where they can profit from ordinary grade-level instruction. We fear that without this assistance, millions of children will never catch up, and will show the negative effects of the school closures throughout their time in school and beyond.

            The Tutoring Marshall Plan will also provide employment to 300,000 college graduates, who will otherwise have difficulty entering the job market in a time of recession. These people are eager to contribute to society and to establish professional careers, but will need a first step on that ladder. Ideally, the best of the tutors will experience the joys of teaching, and might be offered accelerated certification, opening a new source of teacher candidates who will have had an opportunity to build and demonstrate their skills in school settings. Like the CCC and WPA programs in the Great Depression, these tutors will not only be helped to survive the financial crisis, but will perform essential services to the nation while building skills and confidence.

            The Tutoring Marshall Plan needs to start as soon as possible. The need is obvious, both to provide essential jobs to college graduates and to provide proven assistance to struggling students.

            Our proposal, in brief, is to ask the U.S. Congress to fund the following activities:

Spring, 2021

  • Fund existing tutoring programs to build capacity to scale up their programs to serve thousands of struggling students. This would include funds for installing proven tutoring programs in about 2000 schools nationwide.
  • Fund rigorous evaluations of programs that show promise, but have not been evaluated in rigorous, randomized experiments.
  • Fund the development of new programs, especially in areas in which there are few proven models, such as programs for struggling students in secondary schools.

Fall, 2021 to Spring, 2022

  • Provide restricted funds to Title I schools throughout the United States to enable them to hire up to 150,000 tutors to implement proven programs, across all grade levels, 1-9, and in reading and mathematics. This many tutors, mostly using small-group methods, should be able to provide tutoring services to about 6 million students each year. Schools should be asked to agree to select from among proven, effective programs. Schools would implement their chosen programs using tutors who have college degrees and experience with tutoring, teaching, or mentoring children (such as AmeriCorps graduates who were tutors, camp counselors, or Sunday school teachers).
  • As new programs are completed and piloted, third-party evaluators should be funded to evaluate them in randomized experiments, adding to capacity to serve students in grades 1-9. Those programs that produce positive outcomes would then be added to the list of programs available for tutor funding, and their organizations would need to be funded to facilitate preparation for scale-up.
  • Teacher training institutions and school districts should be funded to work together to design accelerated certification programs for outstanding tutors.

Fall, 2022-Spring, 2023

  • Title I schools should be funded to enable them to hire a total of 300,000 tutors. Again, schools will select among proven tutoring programs, which will train, coach, and evaluate tutors across the U.S. We expect these tutors to be able to work with about 12 million struggling students each year.
  • Development, evaluation, and scale-up of proven programs should continue to enrich the number and quality of proven programs adapted to the needs of all kinds of Title I schools.

            The Tutoring Marshall Plan would provide direct benefits to millions of struggling students harmed by Covid-19 school closures, in all parts of the U.S. It would provide meaningful work with a future to college graduates who might otherwise be unemployed. At the same time, it could establish a model of dramatic educational improvement based on rigorous research, contributing to knowledge and use of effective practice. If all goes well, the Tutoring Marshall Plan could demonstrate the power of scaling up proven programs and using research and development to improve the lives of children.

References

Neitzel, A., Lake, C., Pellegrini, M., & Slavin, R. (2020). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. (2020). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How to Make Evidence in Education Make a Difference

By Robert Slavin

I have a vision of how education in the U.S. and the world will begin to make solid, irreversible progress in student achievement. In this vision, school leaders will constantly be looking for the most effective programs, proven in rigorous research to accelerate student achievement. This process of informed selection will be aided by government, which will provide special incentive funds to help schools implement proven programs.

In this imagined future, the fact that schools are selecting programs based on good evidence means that publishers, software companies, professional development companies, researchers, and program developers, as well as government, will be engaged in a constant process of creating, evaluating, and disseminating new approaches to every subject and grade level. As in medicine, developers and researchers will be held to strict standards of evidence, but if they develop programs that meet these high standards, they can be confident that their programs will be widely adopted, and will truly make a difference in student learning.

Discovering and disseminating effective classroom programs is not all we have to get right in education. For example, we also need great teachers, principals, and other staff who are well prepared and effectively deployed. A focus on evidence could help at every step of that process, of course, but improving programs and improving staff are not an either-or proposition. We can and must do both. If medicine, for example, focused only on getting the best doctors, nurses, technicians, other staff, but medical research and dissemination of proven therapies were underfunded and little heeded, then we’d have great staff prescribing ineffective or possibly harmful medicines and procedures. In agriculture, we could try to attract farmers who are outstanding in their fields, but that would not have created the agricultural revolution that has largely solved the problem of hunger in most parts of the world. Instead, decades of research created or identified improvements in seeds, stock, fertilizers, veterinary practices, farming methods, and so on, for all of those outstanding farmers to put into practice.

Back to education, my vision of evidence-based reform depends on many actions. Because of the central role government plays in public education, government must take the lead. Some of this will cost money, but it would be a tiny proportion of the roughly $600 billion we spend on K-12 education annually, at all levels (federal, state, and local). Other actions would cost little or nothing, focusing only on standards for how existing funds are used. Key actions to establish evidence of impact as central to educational decisions are as follows:

  1. Invest substantially in practical, replicable approaches to improving outcomes for students, especially achievement outcomes.

Rigorous, high-quality evidence of effectiveness for educational programs has been appearing since about 2006 at a faster rate than ever before, due in particular to investments by the Institute for Education Sciences (IES), Investing in Innovation/Education Innovation Research (i3/EIR), and the National Science Foundation (NSF) in the U.S., and the Education Endowment Foundation in England, but also other parts of government and private foundations. All have embraced rigorous evaluations involving random assignment to conditions, appropriate measures independent of developers or researchers, and at the higher funding levels, third-party evaluators. These are very important developments, and they have given the research field, educators, and policy makers excellent reasons for confidence that the findings of such research have direct meaning for practice. One problem is that, as is true in every applied field that embraces rigorous research, most experiments do not find positive impacts. Only about 20% of such experiments do find positive outcomes. The solution to this is to learn from successes and failures, so that our success rate improves over time. We also need to support a much larger enterprise of development of new solutions to enduring problems of education, in all subjects and grade levels, and to continue to support rigorous evaluations of the most promising of these innovations. In other words, we should not be daunted by the fact that most evaluations do not find positive impacts, but instead we need to increase the success rate by learning from our own evidence, and to carry out many more experiments. Even 20% of a very big number is a big number.

2. Improve communications of research findings to researchers, educators, policy makers, and the general public.

Evidence will not make a substantial difference in education until key stakeholders see it as a key to improving students’ success. Improving communications certainly includes making it easy for various audiences to find out which programs and practices are truly effective. But we also need to build excitement about evidence. To do this, government might establish large-scale, widely publicized, certain-to-work demonstrations of the use and outcomes of proven approaches, so that all will see how evidence can lead to meaningful change.

I will be writing more on in depth on this topic in future blogs.

3. Set specific standards of evidence, and provide incentive funding for schools to adopt and implement proven practices.

The Every Student Succeeds Act (ESSA) boldly defined “strong,” “moderate,” “promising,” and lower levels of evidence of effectiveness for educational programs, and required use of programs meeting one of these top categories for certain federal funding, especially school improvement funding for low-achieving schools. This certainly increased educators’ interest in evidence, but in practice, it is unclear how much this changed practice or outcomes. These standards need to be made more specific. In addition, the standards need to be applied to funding that is clearly discretionary, to help schools adopt new programs, not to add new evidence requirements to traditional funding sources. The ESSA evidence standards have had less impact than hoped for because they mainly apply to school improvement, a longstanding source of federal funding. As a result, many districts and states have fought hard to have the programs they already have declared “effective,” regardless of their actual evidence base. To make evidence popular, it is important to make proven programs available as something extra, a gift to schools and children rather than a hurdle to continuing existing programs. In coming blogs I’ll write further about how government could greatly accelerate and intensify the process of development, evaluation, communication, and dissemination, so that the entire process can begin to make undeniable improvements in particular areas of critical importance demonstrating how evidence can make a difference for students.

Photo credit: Deeper Learning 4 All/(CC BY-NC 4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Handling Outbreaks after COVID-19 Re-openings: The Case of Germany

By guest blogger Nathan Storey*

As schools across the U.S. are beginning to reopen in hybrid or full formats, unanticipated outbreaks of COVID are bound to occur. To help schools prepare, we have been writing about strategies schools and districts in other countries have used to combat outbreaks.

In this week’s case study, I examine how Germany has responded to outbreaks and managed school reopening nationwide.

Germany

Over one month since reopening after the summer holiday, German schools are largely still open. Critics and health experts worried in the early weeks as cases in the country appeared to increase (Morris & Weber-Steinhaus, 2020), but schools have been able to continue to operate. Now students sit in classes without masks, and children are allowed to move and interact freely on the playground.

Immediately following the reopening, 31 outbreak clusters (150 cases) were identified in the first week of schooling, and 41 schools in Berlin (out of 825 schools in the region) experienced COVID-19 cases during the first two weeks of schooling, requiring quarantines, testing, and temporary closures. Similar issues occurred across the country as schools reopened in other states. Mecklenburg-Western Pomerania, the first state to reopen, saw 800-plus students from Goethe Gymnasium in Ludwigslust sent home for quarantine after a faculty member tested positive. One hundred primary school students in Rostock district were quarantined for two weeks when a fellow student tested positive. Yet now one month later, German schools remain open. How is this possible?

Germany has focused its outbreak responses on individual student and class-level quarantines instead of shutting down entire schools. Due to active and widespread testing nationwide in the early stages of the outbreak, the country was able to get control of community-level positivity rates, paving the way for schools to reopen both in the spring, and again after summer break. Rates rose in August, but tracking enabled authorities to trace the cases to people returning from summer vacation, not from schools. At schools, outbreaks have generally been limited to one teacher or one student, who have contracted the virus from family or community members, not from within the school.

When these outbreaks occur, schools close for a day awaiting test results, but reopen quickly once affected individuals are tested negative and can return to class. At Sophie-Charlotte High School in Berlin, three days after reopening, the school received word that two students tested positive from the girls’ parents. The school in turn informed the local health authority, leading to 191 students and teachers asked to quarantine at home. Everyone was tested and two days later they received their test results. Before the week was up, school was back in session. By one estimate, due to the efficient testing and individual or class quarantines, fewer than 600 Berliner students have had to stay home for a day (out of more than 366,000 students) (Bennhold, 2020).

So far, there has been one more serious outbreak at Heinrich Hertz School in Hamburg, where a cluster of 26 students and three teachers have all received positive diagnoses, potentially infected by one of the teachers. The school moved to quarantine grades six and eight, and mask wearing rules were more strictly followed. The school and local health authorities are continuing to study the potential transmission patterns to locate the origin of the cluster.

Testing in Germany is effective because it is extensive, but targeted to those with direct contact with infections. At Heinz-Berggruen school in Berlin, a sixth grader was found to be infected after being tested even though she had no symptoms. Someone in her family had tested positive. Tracing the family member’s contacts, tests determined the source of the infection stemmed from international travel, and Heinz-Berggruen remained open, with just the infected student quarantined for two weeks. At Goethe Gymnasium in Ludwigslust, mentioned earlier, the infected teacher was sent home, and all 55 teachers were subsequently tested. The school was able to reopen less than a week later.

Some challenges have arisen. As in the US, German states are responsible for their own COVID-19 prevention measures and must make plans for the case of outbreaks. One city councilor in the Neukölln district of Berlin revealed there was confusion among parents and schools about children’s symptoms and response plans. As a result, children whose only symptoms are runny noses, for instance, have been sent home, and worries are increasing as to how effectively schools and districts will differentiate COVID-19 from flu in the winter.

The German case provides some optimism that schools can manage outbreaks and reopen successfully through careful planning and organization. Testing, contact tracing, and communication are vital, as is lowering of community positivity rates. Cases may be rising in Germany again (Loxton, 2020), but with these strategies and new national COVID management rules in place, the country is in an excellent position to address the challenge.

*Nathan Storey is a graduate student at the Johns Hopkins University School of Education

References

Barton, T., & Parekh, A. (2020, August 11). Reopening schools: Lessons from abroad. https://doi.org/10.26099/yr9j-3620

(2020, June 12). As Europe reopens schools, relief combines with risk. The New York Times. https://www.nytimes.com/2020/05/10/world/europe/reopen-schools-germany.html

Bennhold, K. (2020, August 26). Germany faces a ‘roller coaster’ as schools reopen amid Coronavirus—The New York Times. https://www.nytimes.com/2020/08/26/world/europe/germany-schools-virus-reopening.html?smid=em-share

Holcombe, M. (2020, October 5). New York City to close schools in some areas as Northeast sees rise in new cases. CNN. https://www.cnn.com/2020/10/05/health/us-coronavirus-monday/index.html

Loxton, R. (2020, October 15). What you need to know about Germany’s new coronavirus measures for autumn. The Local. https://www.thelocal.de/20201015/what-you-need-to-know-about-germanys-new-coronavirus-measures-for-autumn-and-winter

Medical Xpress. (2020, August 7). Germany closes two schools in new virus blow. https://medicalxpress.com/news/2020-08-germany-schools-virus.html

Morris, L., & Weber-Steinhaus, F. (2020, September 11). Schools have seen no coronavirus outbreaks since reopening a month ago in Germany—The Washington Post. https://www.washingtonpost.com/world/europe/covid-schools-germany/2020/09/10/309648a4-eedf-11ea-bd08-1b10132b458f_story.html

Noryskiewicz, A. (2020, August 25). Coronavirus data 2 weeks into Germany’s school year “reassures” expert. https://www.cbsnews.com/news/coronavirus-school-germany-no-outbreaks/

The Associated Press (2020, August 27). Europe is going back to school despite recent virus surge—Education Week. AP. http://www.edweek.org/ew/articles/2020/08/27/europe-is-going-back-to-school_ap.html?cmp=eml-enl-eu-news2&M=59665135&U=&UUID=4397669ca555af41d7b271f2dafac508

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How Can You Tell When The Findings of a Meta-Analysis Are Likely to Be Valid?

In Baltimore, Faidley’s, founded in 1886, is a much loved seafood market inside Lexington Market. Faidley’s used to be a real old-fashioned market, with sawdust on the floor and an oyster bar in the center. People lined up behind their favorite oyster shucker. In a longstanding tradition, the oyster shuckers picked oysters out of crushed ice and tapped them with their oyster knives. If they sounded full, they opened them. But if they did not, the shuckers discarded them.

I always noticed that the line was longer behind the shucker who was discarding the most oysters. Why? Because everyone knew that the shucker who was pickier was more likely to come up with a dozen fat, delicious oysters, instead of say, nine great ones and three…not so great.

I bring this up today to tell you how to pick full, fair meta-analyses on educational programs. No, you can’t tap them with an oyster knife, but otherwise, the process is similar. You want meta-analysts who are picky about what goes into their meta-analyses. Your goal is to make sure that a meta-analysis produces results that truly represent what teachers and schools are likely to see in practice when they thoughtfully implement an innovative program. If instead you pick the meta-analysis with the biggest effect sizes, you will always be disappointed.

As a special service to my readers, I’m going to let you in on a few trade secrets about how to quickly evaluate a meta-analysis in education.

One very easy way to evaluate a meta-analysis is to look at the overall effect size, probably shown in the abstract. If the overall mean effect size is more than about +0.40, you probably don’t have to read any further. Unless the treatment is tutoring or some other treatment that you would expect to make a massive difference in student achievement, it is rare to find a single legitimate study with an effect size that large, much less an average that large. A very large effect size is almost a guarantee that a meta-analysis is full of studies with design features that greatly inflate effect sizes, not studies with outstandingly effective treatments.

Next, go to the Methods section, which will have within it a section on inclusion (or selection) criteria. It should list the types of studies that were or were not accepted into the study. Some of the criteria will have to do with the focus of the meta-analysis, specifying, for example, “studies of science programs for students in grades 6 to 12.” But your focus is on the criteria that specify how picky the meta-analysis is. As one example of a picky set of critera, here are the main ones we use in Evidence for ESSA and in every analysis we write:

  1. Studies had to use random assignment or matching to assign students to experimental or control groups, with schools and students in each specified in advance.
  2. Students assigned to the experimental group had to be compared to very similar students in a control group, which uses business-as-usual. The experimental and control students must be well matched, within a quarter standard deviation at pretest (ES=+0.25), and attrition (loss of subjects) must be no more than 15% higher in one group than the other at the end of the study. Why? It is essential that experimental and control groups start and remain the same in all ways other than the treatment. Controls for initial differences do not work well when the differences are large.
  3. There must be at least 30 experimental and 30 control students. Analyses of combined effect sizes must control for sample sizes. Why? Evidence finds substantial inflation of effect sizes in very small studies.
  4. The treatments must be provided for at least 12 weeks. Why? Evidence finds major inflation of effect sizes in very brief studies, and brief studies do not represent the reality of the classroom.
  5. Outcome measures must be measures independent of the program developers and researchers. Usually, this means using national tests of achievement, though not necessarily standardized tests. Why? Research has found that tests made by researchers can inflate effect sizes by double, or more, and research-made measures do not represent the reality of classroom assessment.

There may be other details, but these are the most important. Note that there is a double focus of these standards. Each is intended both to minimize bias, but also to maximize similarity to the conditions faced by schools. What principal or teacher who cares about evidence would be interested in adopting a program evaluated in comparison to a very different control group? Or in a study with few subjects, or a very brief duration? Or in a study that used measures made by the developers or researchers? This set is very similar to what the What Works Clearinghouse (WWC) requires, except #5 (the WWC requires exclusion of “overaligned” measures, but not developer-/researcher-made measures).

If these criteria are all there in the “Inclusion Standards,” chances are you are looking at a top-quality meta-analysis. As a rule, it will have average effect sizes lower than those you’ll see in reviews without some or all of these standards, but the effect sizes you see will probably be close to what you will actually get in student achievement gains if your school implements a given program with fidelity and thoughtfulness.

What I find astonishing is how many meta-analyses do not have standards this high. Among experts, these criteria are not controversial, except for the last one, which shouldn’t be. Yet meta-analyses are often written, and accepted by journals, with much lower standards, thereby producing greatly inflated, unrealistic effect sizes.

As one example, there was a meta-analysis of Direct Instruction programs in reading, mathematics, and language, published in the Review of Educational Research (Stockard et al., 2016). I have great respect for Direct Instruction, which has been doing good work for many years. But this meta-analysis was very disturbing.

The inclusion and exclusion criteria in this meta-analysis did not require experimental-control comparisons, did not require well-matched samples, and did not require any minimum sample size or duration. It was not clear how many of the outcomes measures were made by program developers or researchers, rather than independent of the program.

With these minimal inclusion standards, and a very long time span (back to 1966), it is not surprising that the review found a great many qualifying studies. 528, to be exact. The review also reported extraordinary effect sizes: +0.51 for reading, +0.55 for math, and +0.54 for language. If these effects were all true and meaningful, it would mean that DI is much more effective than one-to-one tutoring, for example.

But don’t get your hopes up. The article included an online appendix that showed the sample sizes, study designs, and outcomes of every study.

First, the authors identified eight experimental designs (plus single-subject designs, which were treated separately). Only two of these would meet anyone’s modern standards of meta-analysis: randomized and matched. The others included pre-post gains (no control group), comparisons to test norms, and other pre-scientific designs.

Sample sizes were often extremely small. Leaving aside single-case experiments, there were dozens of single-digit sample sizes (e.g., six students), often with very large effect sizes. Further, there was no indication of study duration.

What is truly astonishing is that RER accepted this study. RER is the top-rated journal in all of education, based on its citation count. Yet this review, and the Kulik & Fletcher (2016) review I cited in a recent blog, clearly did not meet minimal standards for meta-analyses.

My colleagues and I will be working in the coming months to better understand what has gone wrong with meta-analysis in education, and to propose solutions. Of course, our first step will be to spend a lot of time at oyster bars studying how they set such high standards. Oysters and beer will definitely be involved!

Photo credit: Annette White / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

References

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.

Stockard, J., Wood, T. W., Coughlin, C., & Rasplica Khoury, C. (2018). The effectiveness of Direct Instruction curricula: A meta-analysis of a half century of research. Review of Educational Research88(4), 479–507. https://doi.org/10.3102/0034654317751919

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Meta-Analysis or Muddle-Analysis?

One of the best things about living in Baltimore is eating steamed hard shell crabs every summer.  They are cooked in a very spicy mix of spices, and with Maryland corn and Maryland beer, these define the very peak of existence for Marylanders.  (To be precise, the true culture of the crab also extends into Virginia, but does not really exist more than 20 miles inland from the bay).  

As every crab eater knows, a steamed crab comes with a lot of inedible shell and other inner furniture.  So you get perhaps an ounce of delicious meat for every pound of whole crab. Here is a bit of crab math.  Let’s say you have ten pounds of whole crabs, and I have 20 ounces of delicious crabmeat.  Who gets more to eat?  Obviously I do, because your ten pounds of crabs will only yield 10 ounces of meat. 

How Baltimoreans learn about meta-analysis.

All Baltimoreans instinctively understand this from birth.  So why is this same principle not understood by so many meta-analysts?

I recently ran across a meta-analysis of research on intelligent tutoring programs by Kulik & Fletcher (2016),  published in the Review of Educational Research (RER). The meta-analysis reported an overall effect size of +0.66! Considering that the single largest effect size of one-to-one tutoring in mathematics was “only” +0.31 (Torgerson et al., 2013), it is just plain implausible that the average effect size for a computer-assisted instruction intervention is twice as large. Consider that a meta-analysis our group did on elementary mathematics programs found a mean effect size of +0.19 for all digital programs, across 38 rigorous studies (Slavin & Lake, 2008). So how did Kulik & Fletcher come up with +0.66?

The answer is clear. The authors excluded very few studies except for those of less than 30 minutes’ duration. The studies they included used methods known to greatly inflate effect sizes, but they did not exclude or control for them. To the authors’ credit, they then carefully documented the effects of some key methodological factors. For example, they found that “local” measures (presumably made by researchers) had a mean effect size of +0.73, while standardized measures had an effect size of +0.13, replicating findings of many other reviews (e.g., Cheung & Slavin, 2016). They found that studies with sample sizes less than 80 had an effect size of +0.78, while those with samples of more than 250 had an effect size of +0.30. Brief studies had higher effect sizes than those of longer studies, as found in many studies. All of this is nice to know, but even knowing it all, Kulik & Fletcher failed to control for any of it, not even to weight by sample size. So, for example, the implausible mean effect size of +0.66 includes a study with a sample size of 33, a duration of 80 minutes, and an effect size of +1.17, on a “local” test. Another had 48 students, a duration of 50 minutes, and an effect size of +0.95. Now, if you believe that 80 minutes on a computer is three times as effective for math achievement than months of one-to-one tutoring by a teacher, then I have a lovely bridge in Baltimore I’d like to sell you.

I’ve long been aware of these problems with meta-analyses that neither exclude nor control for characteristics of studies known to greatly inflate effect sizes. This was precisely the flaw for which I criticized John Hattie’s equally implausible reviews. But what I did not know until recently was just how widespread this is.

I was working on a proposal to do a meta-analysis of research on technology applications in mathematics. A colleague located every meta-analysis published on this topic since 2013. She found 20 of them. After looking at the remarkable outcomes on a few, I computed a median effect size across all twenty. It was +0.44. That is, to put it mildly, implausible. Looking further, I discovered that only one of the reviews adjusted for sample size (inverse variances). Its mean effect size was +0.05. Every one of the other 19 meta-analyses, all in respectable journals, did not control for methodological features or exclude studies based on them, and reported effect sizes up to +1.02 and +1.05.

Meta-analyses are important, because they are widely read and widely cited, in comparison to individual studies. Yet until meta-analyses start consistently excluding, or at least controlling for studies with factors known to inflate mean effect sizes, then they will have little if any meaning for practice. As things stand now, the overall mean impacts reported by meta-analyses in education depend on how stringent the inclusion standards were, not how effective the interventions truly were.

This is a serious problem for evidence-based reform. Our field knows how to solve it, but all too many meta-analysts do not do so. This needs to change. We see meta-analyses claiming huge impacts, and then wonder why these effects do not transfer to practice. In fact, these big effect sizes do not transfer because they are due to methodological artifacts, not to actual impacts teachers are likely to obtain in real schools with real students.

Ten pounds (160 ounces) of crabs only appear to be more than 20 ounces of crabmeat,  because the crabs contain a lot you need to discard.  The same is true of meta-analyses.  Using small samples, brief durations, and researcher-made measures in evaluations inflate effect sizes without adding anything to the actual impact of treatments for students.  Our job as meta-analysts is to strip away the bias the best we can, and get to the actual impact.  Then we can make comparisons and generalizations that make sense, and move forward understanding of what really works in education.

In our research group, when we deal with thorny issues of meta-analysis, I often ask my colleagues to consider that they had a sister who is a principal.  “What would you say to her,” I ask, “if she asked what really works, all BS aside?  Would you suggest a program that was very effective in a 30-minute study?  One that has only been evaluated with 20 students?  One that has only been shown to be effective if the researcher gets to make the measure?  Principals are sharp, and appropriately skeptical.  Your sister would never accept such evidence.  Especially if she’s experienced with Baltimore crabs.”

References

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45 (5), 283-292.

Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: a meta-analytic review. Review of Educational Research, 86(1), 42-78.

Slavin, R., & Lake, C. (2008). Effective programs in elementary mathematics: A best-evidence synthesis. Review of Educational Research, 78 (3), 427-515.

Torgerson, C. J., Wiggins, A., Torgerson, D., Ainsworth, H., & Hewitt, C. (2013). Every Child Counts: Testing policy effectiveness using a randomised controlled trial, designed, conducted and reported to CONSORT standards. Research In Mathematics Education, 15(2), 141–153. doi:10.1080/14794802.2013.797746.

Photo credit: Kathleen Tyler Conklin/(CC BY 2.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

How Much Have Students Lost in The COVID-19 Shutdowns?

Everyone knows that school closures due to the COVID-19 pandemic are having a serious negative impact on student achievement, and that this impact is sure to be larger for disadvantaged students than for others. However, how large will the impact turn out to be? This is not a grim parlor game for statisticians, but could have real meaning for policy and practice. If the losses turn out to be modest comparable to the “summer slide” we are used to (but which may not exist), then one might argue that when schools open, they might continue where they left off, and students might eventually make up their losses, as they do with summer slide. If, on the other hand, losses are very large, then we need to take emergency action.

Some researchers have used data from summer losses and from other existing data on, for example, teacher strikes, to estimate COVID losses (e.g., Kuhfeld et al., 2020). But now we have concrete evidence, from a country similar to the U.S. in most ways.

A colleague came across a study that has, I believe, the first actual data on this question. It is a recent study from Belgium (Maldonado & DeWitte, 2020) that assessed COVID-19 losses among Dutch-speaking students in that country.

The news is very bad.

The researchers obtained end-of-year test scores from all sixth graders who attend publicly-funded Catholic schools, which are attended by most students in Dutch-speaking Belgium. Sixth grade is the final year of primary school, and while schools were mostly closed from March to June due to COVID, the sixth graders were brought back to their schools in late May to prepare for and take their end-of primary tests. Before returning, the sixth graders had missed about 30% of the days in their school year. They were offered on-line teaching at home, as in the U.S.

The researchers compared the June test scores to those of students in the same schools in previous years, before COVID. After adjustments for other factors, students scored an effect size of -0.19 in mathematics, and -0.29 in Dutch (reading, writing, language). Schools serving many disadvantaged students had significantly larger losses in both subjects; inequality within the schools increased by 17% in mathematics and 20% in Dutch, and inequality between schools increased by 7% in math and 18% in Dutch.

There is every reason to expect that the situation in the U.S. will be much worse than that in Belgium. Most importantly, although Belgium had one of the worst COVID-19 death rates in the world, it has largely conquered the disease by now (fall), and its schools are all open. In contrast, most U.S. schools are closed or partially closed this fall. Students are usually offered remote instruction, but many disadvantaged students lack access to technology and supervision, and even students who do have equipment and supervision do not seem to be learning much, according to anecdotal reports.

In many U.S. schools that have opened fully or partially, outbreaks of the disease are disrupting schooling, and many parents are refusing to send their children to school. Although this varies greatly by regions of the U.S., the average American student is likely to have missed several more effective months of in-person schooling by the time schools return to normal operation.

But even if average losses turn out to be no worse than those seen in Belgium, the consequences are terrifying, for Belgium as well as for the U.S. and other COVID-inflicted countries.

Effect sizes of -0.19 and -0.29 are very large. From the Belgian data on inequality, we might estimate that for disadvantaged students (those in the lowest 25% of socioeconomic status), losses could have been -0.29 in mathematics and -0.39 in Dutch. What do we have in our armamentarium that is strong enough to overcome losses this large?

In a recent blog, I compared average effect sizes from studies of various solutions currently being proposed to remedy students’ losses from COVID shutdowns: Extended school days, after-school programs, summer school, and tutoring. Only tutoring, both one-to-one and one-to-small group, in reading and mathematics, had an effect size larger than +0.10. In fact, there are several one-to-one and one-to-small group tutoring models with effect sizes of +0.40 or more, and averages are around +0.30. Research in both reading and mathematics has shown that well-trained teaching assistants using structured tutoring materials or software can obtain outcomes as good as those obtained by certified teachers as tutors. On the basis of these data, I’ve been writing about a “Marshall Plan” to hire thousands of tutors in every state to provide tutoring to students scoring far below grade level in reading and math, beginning with elementary reading (where the evidence is strongest).

I’ve also written about national programs in the Netherlands and in England to provide tutoring to struggling students. Clearly, we need a program of this kind in the U.S. And if our scores are like the Belgian scores, we need it as quickly as possible. Students who have fallen far below grade level cannot be left to struggle without timely and effective assistance, powerful enough to bring them at least to where they would have been without the COVID school closures. Otherwise, these students are likely to lose motivation, and to suffer lasting damage. An entire generation of students, harmed through no fault of their own, cannot be allowed to sink into failure and despair.

References

Kuhfeld, M., Soland, J., Tarasawa, B., Johnson, A., Ruzek, E., & Liu, J. (2020). Projecting the potential impacts of COVID-19 school closures on academic achievement. (EdWorkingPaper: 20-226). Retrieved from Annenberg Institute at Brown University: https://doi.org/10.26300/cdrv-yw05

Maldonado, J. E., & DeWitte, K. (2020). The effect of school closures on standardized student test outcomes.Leuven, Belgium: University of Leuven.

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Learning from International Schools Part II: Outbreaks after COVID-19 Re-openings: The Case of Israel

By guest blogger Nathan Storey*

The summer is over and fall semester is underway across the United States. Schools are reopening and students are back in the classroom, either virtually or in the flesh. Up to now, the focus of discussion has been about whether and how to open schools: in person, using remote instruction, or some mix of the two. But as schools actually open, those with any element of in-person teaching are starting to worry about how they will handle any outbreaks, should they occur. In fact, many countries that opened their schools before the U.S. have actually experienced outbreaks, and this blog focuses on learning from the tragic experience of Israel.  

In in-person schooling, outbreaks are all but inevitable. “We have to be realistic…if we are reopening schools, there will be some Covid,” says Dr. Benjamin Linas, associate professor of medicine and epidemiology at Boston University (Nierenberg & Pasick, 2020). Even though U.S. schools have already reopened, it is not too late to put outbreak plans into place in order to stem any future outbreaks and allow schools to remain in session.

Israel

On Thursday, September 17, Israel’s school system was shut down due to rising positivity rates; 5,523 new cases were recorded in one day prior to the decision, in a country about one fortieth the size of the U.S. The closures are due to last until October 11, though special education and youth-at-risk programs are continuing. The spike in COVID cases reported by health officials centered around children 10 years of age and up. “The government made the wrong decision, against professional recommendations,” COVID commissioner and Professor Ronni Gamzu wrote in a letter to Health Minister Yuli Edelstein and Education Minister Yoav Gallant.

Israel has been a cautionary tale since reopening schools in May. By July, 977 students and teachers were diagnosed with COVID, 22,520 had been quarantined, and 393 schools and kindergartens had been closed by the Education Ministry (Kershner & Belluck, 2020; Tarnopolsky, 2020). At the beginning of September, 30 “red” cities and neighborhoods were placed under lockdown due to spikes. Almost 4,000 students and over 1,600 teachers are currently in quarantine, while more than 900 teachers and students have been diagnosed with the virus (Savir, 2020).

Schools initially reopened following a phased approach and using social distancing and mask protocols. Students with diagnosed family members were not allowed back, and older staff members and those at risk were told not to return to the classroom. It seemed as if they were doing everything right. But then, a heat wave wiped all the progress away.

Lifting the face mask requirement for four days and allowing schools to shut their windows (so they could air condition) offered new opportunities for the virus to run rampant. An outbreak at Gymnasia Rehavia, a high school in Jerusalem, turned into the largest single-school outbreak seen so far, soon reaching to students’ homes and communities. Outbreaks also appeared outside of the Jerusalem area, including in an elementary school in Jaffa. Reflecting on the nationwide spread of the virus, researchers have estimated that as much as 47% of the total new infections in the whole of Israel could be traced to Israeli schools (Tarnopolsky, 2020), introduced to schools by adult teachers and employees, and spread by students, particularly middle-school aged children.

This crisis serves to illustrate just how important it is for education leaders, teachers, and students to remain vigilant in prevention efforts. The Israeli schools largely had the right ideas to ensure prevention. Some challenges existed, particularly related to fitting students into classrooms while maintaining six feet separation given large class sizes (in some cases, classrooms of 500 square feet have to hold as many as 38 students). But by relaxing their distancing regulations, the schools opened students, staff, and communities to a major outbreak.

Schools responded with quarantining individual students, classmates of infected students, teachers, and staff; and when a second unconnected case was detected, schools would close for two weeks. But Israel did not place a priority on contact tracing and testing. Students and staff were tested following outbreaks, but they experienced long wait times to take the test, increasing the opportunities for spread. In the case of one school outbreak, Professor Eli Waxman of Weizmann Institute of Science reported that school officials could not identify which buses students took to reach school (Kershner & Belluck, 2020). Having this type of information is vital for tracing who infected students may have come into contact with, especially for younger students who may not be able to list all those with whom they’ve been in close contact.

Before the fall semester began, it looked as if Israel had learned from their previous mistakes. The Education Ministry disseminated new regulations adapted to the local level based on infection rates, and once more planned a phased reopening approach starting with K-4th grades, followed by middle- and high-school students, who were set to follow a hybrid remote and in-person instruction approach. Schools planned to use plastic barriers to separate students in the classroom. Education leaders were to develop a guidebook to support the transition from in-person to distance learning and procedures to maintain distancing during celebrations or graduation ceremonies.

These precautions and adaptive plans suggested that Israel had learned from the mistakes made in the summer. Upon reopening, a new lesson was learned. Schools cannot reopen in a sustainable and long-term manner if community positivity rates are not under control.

*Nathan Storey is a graduate student at the Johns Hopkins University School of Education

References

Couzin-Frankel, J., Vogel, G., & Weil, M. (2020, July 7). School openings across globe suggest ways to keep coronavirus at bay, despite outbreaks. Science | AAAS. https://www.sciencemag.org/news/2020/07/school-openings-across-globe-suggest-ways-keep-coronavirus-bay-despite-outbreaks

Jaffe-Hoffman, M. (2020, September 16). 5,500 new coronavirus cases, as gov’t rules to close schools Thursday. The Jerusalem Post. https://www.jpost.com/breaking-news/coronavirus-4973-new-cases-in-the-last-day-642338

Kauffman, J. (2020, July 29). Israel’s hurried school reopenings serve as a cautionary tale. The World from PRX. https://www.pri.org/stories/2020-07-29/israels-hurried-school-reopenings-serve-cautionary-tale

Kershner, I., & Belluck, P. (2020, August 4). When Covid subsided, Israel reopened its schools. It didn’t go well. The New York Times. https://www.nytimes.com/2020/08/04/world/middleeast/coronavirus-israel-schools-reopen.html

Nierenberg, A., & Pasick, A. (2020, September 16). For school outbreaks, it’s when, not if—The New York Times. The New York Times. https://www.nytimes.com/2020/09/16/us/for-school-outbreaks-its-when-not-if.html

Savir, A. (2020, September 1). 2.4 million Israeli students go back to school in shadow of COVID-19. J-Wire. https://www.jwire.com.au/2-4-million-israeli-students-go-back-to-school-in-shadow-of-covid-19/

Schwartz, F., & Lieber, D. (2020, July 14). Israelis fear schools reopened too soon as Covid-19 cases climb. Wall Street Journal. https://www.wsj.com/articles/israelis-fear-schools-reopened-too-soon-as-covid-19-cases-climb-11594760001

Tarnopolsky, N. (2020, July 14). Israeli data show school openings were a disaster that wiped out lockdown gains. The Daily Beast. https://www.thedailybeast.com/israeli-data-show-school-openings-were-a-disaster-that-wiped-out-lockdown-gains

Photo credit: Talmoryair / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org

Learning from International Schools: Outbreaks after COVID-19 Re-openings: The Case of the United Kingdom

By guest blogger Nathan Storey, Johns Hopkins University*

For much of the summer, U.S. education leaders and media have questioned how to safely reopen schools to students and teachers. Districts have struggled to put together concrete plans for how to structure classes, how much of the instruction would be in person, how to maintain social distancing in the classroom, and how to minimize health risks.

Most school districts have focused on preventing outbreaks through masks and social distancing, among other measures. However, this has left a gap—what happens to these well-thought-out plans if and when there’s an outbreak? While many school districts (including 12 of the 15 largest in the United States) have opted to start schooling remotely, many others plan to or have already restarted in-person schooling, often without detailed prevention and response plans in place.

For those districts committed to in-person schooling, outbreaks in at least some schools are all but inevitable. Community positivity rates within the United States remain high, with some states experiencing positivity rates of up to 5.4% (CDC, 2020), compared to 2.3% in Scotland or 0.8% across the entire United Kingdom (JHU, 2020). The image of students without masks packed into the hallways of a Georgia school have already spread nationwide. It is clearly important to put these plans into place as soon as possible in order to stem any outbreaks and allow schools to remain in session.

In a series of case studies, I will examine the experiences of how other countries with similar education systems dealt with outbreaks in their schools and share lessons learned for the United States.

United Kingdom

Schools in England and Wales finally reopened last week for the fall semester, but Scottish schools reopened the week of August 10. Outbreaks in Scotland have been minimal, but a cluster of school outbreaks cropped up in the Glasgow region, most notably at Bannerman High School. Affected schools soon closed for one week following the positive tests, but students who tested positive remained at home in self-isolation for 14 days.

What makes this outbreak notable is that through testing of students and community members, researchers were able to trace the outbreak to a cluster of infections amongst senior managers at McVities biscuit factory, also in Glasgow. Having successfully traced the infections to this source, education leaders and researchers were able to determine that cases were not being transmitted within schools, and put into effect appropriate isolation procedures for potentially infected students and faculty.

Testing and contact tracing were conducted first during the spring and summer months when schools first reopened in the UK, following the national shutdown in March. Researchers (Ismail et al., 2020) were able to determine sources of outbreaks and prevalence amongst students and faculty, finding that transmission was less common within schools, providing crucial information to improve COVID understanding and informing quarantine and school lockdown protocols in the country.

Scotland has put into place a strong contact tracing protocol, coupled with self-isolation, social distancing, and more intensive hygiene protocols. Scientists from England have urged weekly testing of teachers, as well as “test and trace” protocols, but the schools minister, Nick Gibb, instead committed to testing of symptomatic individuals only. Researcher Michael Fischer recently launched the COVID-19 Volunteer Testing Network, hoping to create a network of laboratories across the UK using basic equipment common in most labs (specifically, a polymerase chain reaction or PCR machine) to provide rapid testing. Eventually, as many as 1,000 labs could each do 800 tests a day, providing rapid response to COVID-19 tests and enabling more effective contact tracing and allowing schools to isolate students and staff members without requiring entire schools to be shut down.

Another means of accelerating testing and contact tracing is through group or pooled testing. One scientist in England pointed to this form of testing—in which multiple individuals’ samples are pooled together and tested simultaneously, with subsequent individual tests in the event of a positive test result—as a means of providing quick testing even if testing materials are limited. This could be particularly useful for schools implementing clustered classrooms or educational pods, keeping students together throughout the day and limiting contact with other students and staff.

Through careful and thorough testing and contact tracing, as exemplified by the United Kingdom’s efforts, coupled with careful social distancing and preventative measures, United States school districts in areas with low positivity rates, comparable to those in the United Kingdom, could more systematically address outbreaks, avoiding entire school shutdowns, which can be disruptive to education for students. Preventative measures alone are not likely to be enough to get students and staff through what promises to be a difficult school year. These outbreak responsive systems are likely to be necessary as well.

References

Brazell, E. (2020, April 2). Scientist donates £1,000,000 to massively increase UK coronavirus testing. Metro. https://metro.co.uk/2020/04/02/scientist-donates-1000000-massively-increase-uk-coronavirus-testing-12499729/

CDC. (2020, September 4). COVIDView, Key Updates for Week 33. Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

Davis, N. (2020, August 10). Scientists urge routine Covid testing when English schools reopen. The Guardian. https://www.theguardian.com/education/2020/aug/10/scientists-urge-routine-covid-testing-when-english-schools-reopen

Duffy, E. (2020, August 19). Scots school closes with immediate effect after multiple confirmed cases of Covid-19. The Herald. https://www.heraldscotland.com/news/18662461.kingspark-school-dundee-school-closes-multiple-cases-covid-19-confirmed/

Government of United Kingdom. (2020, September 8). Coronavirus (COVID-19) in the UK: UK Summary. https://coronavirus.data.gov.uk/

Ismail, S. A., Saliba, V., Bernal, J. L., Ramsay, M. E., & Ladhani, S. N. (2020). SARS-CoV-2 infection and transmission in educational settings: Cross-sectional analysis of clusters and outbreaks in England (pp. 1–28). Public Health England. https://doi.org/10.1101/2020.08.21.20178574

Johns Hopkins University. (2020, September 8). Daily Testing Trends in the US – Johns Hopkins. Johns Hopkins Coronavirus Resource Center. https://coronavirus.jhu.edu/testing/individual-states

Macpherson, R. (2020, August 16). Coronavirus Scotland: Another pupil at Bannerman High School in Glasgow tests positive as cluster hits 12 cases – The Scottish Sun. https://www.thescottishsun.co.uk/news/5937611/coronavirus-scotland-bannerman-high-school-covid19/

Palmer, M. (2020, April 1). Call for small UK labs to embrace Dunkirk spirit and produce Covid-19 tests. Sifted. https://sifted.eu/articles/uk-labs-coronavirus-testing/

*Nathan Storey is a graduate student at the Johns Hopkins University School of Education

This blog was developed with support from Arnold Ventures. The views expressed here do not necessarily reflect those of Arnold Ventures.

Note: If you would like to subscribe to Robert Slavin’s weekly blogs, just send your email address to thebee@bestevidence.org