Evidence Affects School Change and Teacher-by-Teacher Change Differently

Nell Duke, now a distinguished professor at the University of Michigan, likes to tell a story about using cooperative learning as a young teacher. She had read a lot about cooperative learning and was excited to try it in her elementary class. However, not long after she started, her principal came to her class and asked her to step into the hall. “Miss Duke,” he said, “what in blazes are you doing in there?”

Nell told her principal all about cooperative learning, and how strongly the research supported it, and how her students were so excited to work in groups and help each other learn.

“Cooperative learning?” said her principal. “Well, I suppose that’s all right. But from now on could you do it quietly?”

Nell Duke’s story exemplifies one of the most important problems in research-based reform in education. Should research-based reform focus on teachers or on schools? Nell was following the evidence, and her students were enjoying the new method and seemed to be learning better because of it. Yet in her school, she was the only teacher using cooperative learning. As a result, she did not have the support or understanding of her principal, or even of her fellow teachers. Her principal had rules about keeping noise levels down, and he was not about to make an exception for one teacher.

However, the problem of evidence-based reform for teachers as opposed to schools goes far beyond the problems of one noisy classroom. The problem is that it is difficult to do reform one teacher at a time. In fact, it is very difficult to even do high-quality program evaluations at the teacher level, and as a result, most programs listed as effective in the What Works Clearinghouse or Evidence for ESSA are designed for use at least in whole grade levels, and often in whole schools. One reason for this is that it is more cost-effective to provide coaching to whole schools or grade levels. Most successful programs provide initial professional development to many teachers and then follow up with coaching visits to teachers using new methods, to give them feedback and encouragement. It is too expensive for most schools to provide extensive coaching to just one or a small number of teachers. Further, multiple teachers working together can support each other, ask each other questions, and visit each other’s classes. Principals and other administrative staff can support the whole school in using proven programs, but a principal responsible for many teachers is not likely to spend a lot of time learning about a method used by just one or two teachers.

blog_1-23-20_teachersschool_500x333

When we were disseminating cooperative learning programs in the 1980s, we started off providing large workshops for anyone who wanted to attend. These were very popular and teachers loved them, but when we checked in a year later, many teachers were not using the methods they’d learned. Why? The answer was most often that teachers had difficulty sustaining a new program without much support from their leadership or colleagues. We’d found that on-site coaching was essential for quality implementation, but we could not provide coaching to widely dispersed schools. Instead, we began to focus on school-wide implementations of cooperative learning. This soon led to our development and successful evaluations of Success for All, as we learned that working with whole schools made it possible not only to ensure high-quality implementations of cooperative learning, but also to add in grouping strategies, tutoring for struggling readers, parent involvement approaches, and other elements that would have been impossible to do in a teacher-by teacher approach to change.

In comparison with our experience with cooperative learning focused on individual teachers, Success for All has both been more effective and longer-lasting. The median Success for All school has used the program for 11 years, for example.

Of course, it is still important to have research-based strategies that teachers can use on their own. Cooperative learning itself can be used this way, as can proven strategies for classroom management, instruction, assessment, feedback, and much more. Yet it is often the case that practices suggested to individual teachers were in fact evaluated in whole school or grade levels. It is probably better for teachers to use programs proven effective in school-level research than to use unevaluated approaches, but teachers using such programs on their own should be aware that teachers in school-level evaluations probably received a lot of professional development and in-class coaching. To get the same results, individual teachers might visit others using the programs successfully, or at a minimum participate in social media conversations with other teachers using the same approaches.

Individual teachers interested in using proven programs and practices might do best to make common cause with colleagues and approach the principal about trying the new method in their grade level or in the school as a whole. This way, it is possible to obtain the benefits of school-wide implementation while playing an active role in the process of innovation.

There are never guarantees in any form of innovation, but teachers who are eager to improve their teaching and their students’ learning can work with receptive principals to systematically try out and informally evaluate promising approaches. Perhaps nothing would have changed the mind of Nell Duke’s principal, but most principals value initiative on the part of their teachers to try out likely solutions to improve students’ learning.

The numbers of children who need proven programs to reach their full potential is vast. Whenever possible, shouldn’t we try to reach larger numbers of students with well-conceived and well-supported implementations of proven teaching methods?

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Queasy about Quasi-Experiments? How Rigorous Quasi-Experiments Can Minimize Bias

I once had a statistics professor who loved to start discussions of experimental design with the following:

“First, pick your favorite random number.”

Obviously, if you pick a favorite random number, it isn’t random. I was recalling this bit of absurdity recently when discussing with colleagues the relative value of randomized experiments (RCTs) and matched studies, or quasi-experimental designs (QED). In randomized experiments, students, teachers, classes, or schools are assigned at random to experimental or control conditions. In quasi-experiments, a group of students, teachers, classes, or schools is identified as the experimental group, and then other schools are located (usually in the same districts) and then matched on key variables, such as prior test scores, percent free lunch, ethnicity, and perhaps other factors. The ESSA evidence standards, the What Works Clearinghouse, Evidence for ESSA, and most methodologists favor randomized experiments over QEDs, but there are situations in which RCTs are not feasible. In a recent “Straight Talk on Evidence,” Jon Baron discussed how QEDs can approach the usefulness of RCTs. In this blog, I build on Baron’s article and go further into strategies for getting the best, most unbiased results possible from QEDs.

Randomized and quasi-experimental studies are very similar in most ways. Both almost always compare experimental and control schools that were very similar on key performance and demographic factors. Both use the same statistics, and require the same number of students or clusters for adequate power. Both apply the same logic, that the control group mean represents a good approximation of what the experimental group would have achieved, on average, if the experiment had never taken place.

However, there is one big difference between randomized and quasi-experiments. In a well-designed randomized experiment, the experimental and control groups can be assumed to be equal not only on observed variables, such as pretests and socio-economic status, but also on unobserved variables. The unobserved variables we worry most about have to do with selection bias. How did it happen (in a quasi-experiment) that the experimental group chose to use the experimental treatment, or was assigned to the experimental treatment? If a set of schools decided to use the experimental treatment on their own, then these schools might be composed of teachers or principals who are more oriented toward innovation, for example. Or if the experimental treatment is difficult, the teachers who would choose it might be more hard-working. If it is expensive, then perhaps the experimental schools have more money. Any of these factors could bias the study toward finding positive effects, because schools that have teachers who are motivated or hard-working, in schools with more resources, might perform better than control schools with or without the experimental treatment.

blog_1-16-20_normalcurve_500x333

Because of this problem of selection bias, studies that use quasi-experimental designs generally have larger effect sizes than do randomized experiments. Cheung & Slavin (2016) studied the effects of methodological features of studies on effect sizes. They obtained effect sizes from 645 studies of elementary and secondary reading, mathematics, and science, as well as early childhood programs. These studies had already passed a screening in which they would have been excluded if they had serious design flaws. The results were as follows:

  No. of studies Mean effect size
Quasi-experiments 449 +0.23
Randomized experiments 196 +0.16

Clearly, mean effect sizes were larger in the quasi-experiments, suggesting the possibility that there was bias. Compared to factors such as sample size and use of developer- or researcher-made measures, the amount of effect size inflation in quasi-experiments was modest, and some meta-analyses comparing randomized and quasi-experimental studies have found no difference at all.

Relative Advantages of Randomized and Quasi-Experiments

Because of the problems of selection bias, randomized experiments are preferred to quasi-experiments, all other factors being equal. However, there are times when quasi-experiments may be necessary for practical reasons. For example, it can be easier to recruit and serve schools in a quasi-experiment, and it can be less expensive. A randomized experiment requires that schools be recruited with the promise that they will receive an exciting program. Yet half of them will instead be in a control group, and to keep them willing to sign up, they may be given a lot of money, or an opportunity to receive the program later on. In a quasi-experiment, the experimental schools all get the treatment they want, and control schools just have to agree to be tested.  A quasi-experiment allows schools in a given district to work together, instead of insisting that experimental and control schools both exist in each district. This better simulates the reality schools are likely to face when a program goes into dissemination. If the problems of selection bias can be minimized, quasi-experiments have many attractions.

An ideal design for quasi-experiments would obtain the same unbiased outcomes as a randomized evaluation of the same treatment might do. The purpose of this blog is to discuss ways to minimize bias in quasi-experiments.

In practice, there are several distinct forms of quasi-experiments. Some have considerable likelihood of bias. However, others have much less potential for bias. In general, quasi-experiments to avoid are forms of post-hoc, or after-the-fact designs, in which determination of experimental and control groups takes place after the experiment. Quasi-experiments with much less likelihood of bias are pre-specified designs, in which experimental and control schools, classrooms, or students are identified and registered in advance. In the following sections, I will discuss these very different types of quasi-experiments.

Post-Hoc Designs

Post-hoc designs generally identify schools, teachers, classes, or students who participated in a given treatment, and then find matches for each in routinely collected data, such as district or school standardized test scores, attendance, or retention rates. The routinely collected data (such as state test scores or attendance) are collected as pre-and posttests from school records, so it may be that neither experimental nor control schools’ staffs are even aware that the experiment happened.

Post-hoc designs sound valid; the experimental and control groups were well matched at pretest, so if the experimental group gained more than the control group, that indicates an effective treatment, right?

Not so fast. There is much potential for bias in this design. First, the experimental schools are almost invariably those that actually implemented the treatment. Any schools that dropped out or (even worse) any that were deemed not to have implemented the treatment enough have disappeared from the study. This means that the surviving schools were different in some important way from those that dropped out. For example, imagine that in a study of computer-assisted instruction, schools were dropped if fewer than 50% of students used the software as much as the developers thought they should. The schools that dropped out must have had characteristics that made them unable to implement the program sufficiently. For example, they might have been deficient in teachers’ motivation, organization, skill with technology, or leadership, all factors that might also impact achievement with or without the computers. The experimental group is only keeping the “best” schools, but the control schools will represent the full range, from best to worst. That’s bias. Similarly, if individual students are included in the experimental group only if they actually used the experimental treatment a certain amount, that introduces bias, because the students who did not use the treatment may be less motivated, have lower attendance, or have other deficits.

As another example, developers or researchers may select experimental schools that they know did exceptionally well with the treatment. Then they may find control schools that match on pretest. The problem is that there could be unmeasured characteristics of the experimental schools that could cause these schools to get good results even without the treatment. This introduces serious bias. This is a particular problem if researchers pick experimental or control schools from a large database. The schools will be matched at pretest, but since the researchers may have many potential control schools to choose among, they may use selection rules that, while they maintain initial equality, introduce bias. The readers of the study might never be able to find out if this happened.

Pre-Specified Designs

The best way to minimize bias in quasi-experiments is to identify experimental and control schools in advance (as contrasted with post hoc), before the treatment is applied. After experimental and control schools, classes, or students are identified and matched on pretest scores and other factors, the names of schools, teachers, and possibly students on each list should be registered on the Registry of Efficacy and Effectiveness Studies. This way, all schools (and all students) involved in the study are counted in intent-to-treat (ITT) analyses, just as is expected in randomized studies. The total effect of the treatment is based on this list, even if some schools or students dropped out along the way. An ITT analysis reflects the reality of program effects, because it is rare that all schools or students actually use educational treatments. Such studies also usually report effects of treatment on the treated (TOT), focusing on schools and students who did implement for treatment, but such analyses are of only minor interest, as they are known to reflect bias in favor of the treatment group.

Because most government funders in effect require use of random assignment, the number of quasi-experiments is rapidly diminishing. All things being equal, randomized studies should be preferred. However, quasi-experiments may better fit the practical realities of a given treatment or population, and as such, I hope there can be a place for rigorous quasi-experiments. We need not be so queasy about quasi-experiments if they are designed to minimize bias.

References

Baron, J. (2019, December 12). Why most non-RCT program evaluation findings are unreliable (and a way to improve them). Washington, DC: Arnold Ventures.

Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45 (5), 283-292.

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Why Can’t Education Progress Like Medicine Does?

I recently saw an end-of-year article in The Washington Post called “19 Good Things That Happened in 2019.” Four of them were medical or public health breakthroughs. Scientists announced a new therapy for cystic fibrosis likely to benefit 90% of people with this terrible disease, incurable for most patients before now. The World Health Organization announced a new vaccine to prevent Ebola. The Bill and Melinda Gates Foundation announced that deaths of children before their fifth birthday have now dropped from 82 per thousand births in 1990 to 37 in 2019. The Centers for Disease Control reported a decline of 5.1 percent in deaths from drug overdoses in just one year, from 2017 to 2018.

Needless to say, breakthroughs in education did not make the list. In fact, I’ll bet there has never been an education breakthrough mentioned on such lists.

blog_1-9-20_kiddoctor_337x500 I get a lot of criticism from all sides for comparing education to medicine and public health. Most commonly, I’m told that it’s ever so much easier to give someone a pill than to change complex systems of education. That’s true enough, but not one of the 2019 medical or public health breakthroughs was anything like “taking a pill.” The cystic fibrosis cure involves a series of three treatments personalized to the genetic background of patients. It took decades to find and test this treatment. A vaccine for Ebola may be simple in concept, but it also took decades to develop. Also, Ebola occurs in very poor countries, where ensuring universal coverage with a vaccine is very complex. Reducing deaths of infants and toddlers took massive coordinated efforts of national governments, international organizations, and ongoing research and development. There is still much to do, of course, but the progress made so far is astonishing. Similarly, the drop in deaths due to overdoses required, and still requires, huge investments, cooperation between government agencies of all sorts, and constant research, development, and dissemination. In fact, I would argue that reducing infant deaths and overdose deaths strongly resemble what education would have to do to, for example, eliminate reading failure or enable all students to succeed at middle school mathematics. No one distinct intervention, no one miracle pill has by itself improved infant mortality or overdose mortality, and solutions for reading and math failure will similarly involve many elements and coordinated efforts among many government agencies, private foundations, and educators, as well as researchers and developers.

The difference between evidence-based reform in medicine/public health and education is, I believe, a difference in societal commitment to solving the problems. The general public, especially political leaders, tend to be rather complacent about educational failures. One of our past presidents said he wanted to help, but said, “We have more will than wallet” to solve educational problems. Another focused his education plans on recruiting volunteers to help with reading. These policies hardly communicate seriousness. In contrast, if medicine or public health can significantly reduce death or disease, it’s hard to be complacent.

Perhaps part of the motivational difference is due to the situations of powerful people. Anyone can get a disease, so powerful individuals are likely to have children or other relatives or friends who suffer from a given disease. In contrast, they may assume that children failing in school have inadequate parents or parents who need improved job opportunities or economic security or decent housing, which will take decades, and massive investments to solve. As a result, governments allocate little money for research, development, or dissemination of proven programs.

There is no doubt in my mind that we could, for example, eliminate early reading failure, using the same techniques used to eliminate diseases: research, development, practical experiments, and planful, rapid scale-up. It’s all a question of resources, political leadership, collaboration among many critical agencies and individuals, and a total commitment to getting the job done. The year reading failure drops to near zero nationwide, perhaps education will make the Washington Post list of “50 Good Things That Happened in 2050.”

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

What Works in Professional Development

I recently read an IES-funded study, called “The Effects of a Principal Professional Development Program Focused on Instructional Leadership.” The study, reported by a research team at Mathematica (Hermann et al., 2019), was a two-year evaluation of a Center for Educational Leadership (CEL) program in which elementary principals received 188 hours of PD, including a 28-hour summer institute at the beginning of the program, quarterly virtual professional learning community sessions in which principals met other principals and CEL coaches, and 50 hours per year of individual coaching in which principals worked with their CEL coaches to set goals, implement strategies, and analyze effects of strategies. Principals helped teachers improve instruction by observing teachers, giving feedback, and selecting curricula; sought to improve their recruitment, management, and retention strategies, held PD sessions for teachers; and focused on setting a school mission, improving school climate, and deploying resources effectively.

A total of 100 low-achieving schools were recruited. Half received the CEL program, and half served as controls. After one, two, and three years, there were no differences between experimental and control schools on standardized measures of student reading or mathematics achievement, no differences on school climate, and no differences on principal or teacher retention.

So what happened? First, it is important to note that previous studies of principal professional development have also found zero (e.g., Jacob et al., 2014) or very small and inconsistent effects (e.g., Nunnery et al., 2011, 2016). Second, numerous studies of certain types of professional development for teachers have also found very small or zero impacts. For example, a review of research on elementary mathematics programs by Pellegrini et al. (2019) identified 12 qualifying studies of professional development for mathematics content and pedagogy. The average effect size was essentially zero (ES=+0.04).

What does work in professional development?

In sharp contrast to these dismal findings, there are many forms of professional development that work very well. For example, in the Pellegrini et al. (2019) mathematics review, professional development designed to teach teachers to use specific instructional processes were very effective, averaging ES=+0.25. These included studies of cooperative learning, classroom management strategies, and individualized instruction. In fact, other than one-to-one and one-to-small group tutoring, no other type of approach was as effective. In a review of research on programs for elementary struggling readers by Inns et al. (2019), programs incorporating cooperative learning had an effect size of +0.29, more effective than any other programs except tutoring. A review of research on secondary reading programs by Baye et al. (2018) found that cooperative learning programs and whole-school models incorporating cooperative learning, along with writing-focused models also incorporating cooperative learning, had larger impacts than anything other than tutoring.

How can it be that professional development on cooperative learning and classroom management are so much more effective than professional development on content, pedagogy, and general teaching strategies?

One reason, I would submit, is that it is very difficult to teach someone to improve practices that they already know how to do. For example, if as an adult you took a course in tennis or golf or sailing or bridge, you probably noticed that you learned very rapidly, retained what you learned, and quickly improved your performance in that new skill. Contrast this with a course on dieting or parenting. The problem with improving your eating or parenting is that you already know very well how to eat, and if you already have kids, you know how to parent. You could probably stand some improvement in these areas, which is why you took the course, but no matter how motivated you are to improve, over time you are likely to fall back on well-established routines, or even bad habits. The same is true of teaching. Early in their careers teachers develop routine ways of performing each of the tasks of teaching: lecturing, planning, praising, dealing with misbehavior, and so on. Teachers know their content and settle into patterns of communicating that content to students. Then one day a professional developer shows up, who watches teachers teaching and gives them advice. The advice might take, but quite often teachers give it a try, run into difficulties, and then settle back into comfortable routines.

Now consider a more specific, concrete set of strategies that are distinctly different from what teachers typically do: cooperative learning. Teachers can readily learn the key components. They put their students in mixed groups of four or five. After an initial lesson, they give students opportunities to work together to make sure that everyone can succeed at the task. Teachers observe and assist students during team practice. They assess student learning, and celebrate student success. Every one of these components is a well-defined, easily learned, and easily observed step. Teachers need training and coaching to succeed at first, but after a while, cooperative learning itself becomes second nature. It helps that almost all kids love to be noisy and engaged, and love to work with each other, so they are rooting for the teacher to succeed. But for most teachers, structured cooperative learning is distinctly different from ordinary teaching, so it is easy to learn and maintain.

blog_12-19-19_celebratingteachers_500x341

As another example, consider classroom management strategies used in many programs. Trainers show teachers how to use Popsicle sticks with kids’ names on them to call on students, so all kids have to pay attention in case they are called. To get students’ immediate attention, teachers may learn to raise their hands and have students raise theirs, or to ring a bell, or to say a phrase like “one, two, three, look at me.” Teachers may learn to give points to groups or individuals who are meeting class expectations. They may learn to give students or groups privileges, such as lining up first to go outside or having the privilege of selecting and leading their favorite team or class cheer. These and many other teacher behaviors are clear, distinct, easily learned, and immediately solve persistent problems of low-level disturbances.

The point is not that these cooperative learning or classroom management strategies are more important than content knowledge or pedagogy. However, they are easily learned, retained, and institutionalized ways of solving critical daily problems of teaching, and they are so well-defined and clear that when they have started working, teachers are likely to hold on to them indefinitely and are unlikely to fall back on other strategies that may be less effective but are already deeply ingrained.

I am not suggesting that only observable, structural classroom reforms such as cooperative learning or classroom management strategies are good uses of professional development resources. All aspects of teaching need successive improvement, of course. But I am using these examples to illustrate why certain types of professional development are very difficult to make effective. It may be that improving the content and pedagogy teachers use day in and day out may require more concrete, specific strategies. I hope developers and researchers will create and successfully evaluate such new approaches, so that teachers can continually improve their effectiveness in all areas. But there are whole categories of professional development that research repeatedly finds are just not working. Researchers and educators need to focus on why this is true, and then design new PD strategies that are less subtle, more observable, and deal more with actual teacher and student behavior.

References

Hermann, M., Clark, M., James-Burdumy, S., Tuttle, C., Kautz, T., Knechtel, V., Dotter, D., Wulsin, C.S., & Deke, J. (2019). The effects of a principal professional development program focused on instructional leadership (NCEE 2020-0002). Washington, DC: Naitonal Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

Jacob, R., Goddard, K., Miller, R., & Goddard, Y. (2014). Exploring the causal impact of the McREL Balanced Leadership Program on leadership, principal efficacy, instructional climate, educator turnover, and student achievement. Educational Evaluation and Policy Analysis, 52 187-220.

Nunnery, J., Ross, S., Chappel, S., Pribesh, S., & Hoag-Carhart, E. (2011). The impact of the National Institute for School Leadership’s Executive Development Program on school performance trends in Massachusetts: Cohort 2 Results. Norfolk, VA: Center for Educational Partnerships, Old Dominion University.

Nunnery, J., Ross, S., & Reilly, J. (2016). An evaluation of the National Institute for School Leadership: Executive Development Program in Milwaukee Public Schools. Norfolk, VA: Center for Educational Partnerships, Old Dominion University.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. (2019). Effective programs in elementary mathematics: A best-evidence synthesis. Available at www.bestevidence.com. Manuscript submitted for publication.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence-Based Reform and the Multi-Academy Trust

Recently, I was in England to visit Success for All (SFA) schools there. I saw two of the best SFA schools I’ve ever seen anywhere, Applegarth Primary School in Croyden, south of London, and Houldsworth Primary School in Sussex, southeast of London. Both are very high-poverty schools with histories of poor achievement, violence, and high staff turnover. Applegarth mostly serves the children of African immigrants, and Houldsworth mostly serves White students from very poor homes. Yet I saw every class in each school and in each one, children were highly engaged, excited, and learning like crazy. Both schools were once in the lowest one percent of achievement in England, yet both are now performing at or above national norms.

In my travels, I often see outstanding Success for All schools. However, in this case I learned about an important set of policies that goes beyond Success for All, but could have implications for evidence-based reform more broadly.

blog_12-12-19_UKschoolkids_500x334

Both Applegarth and Houldsworth are in multi-academy trusts (MATs), the STEP Trust and the Unity Trust, respectively. Academies are much like charter schools in the U.S., and multi-academy trusts are organizations that run more than one academy. Academies are far more common in the U.K. than the U.S., constituting 22% of primary (i.e., elementary) schools and 68% of secondary schools. There are 1,170 multi-academy trusts, managing more than 5,000 of Britain’s 32,000 schools, or 16%. Multi-academy trusts can operate within a single local authority (school district) (like Success Academies in New York City) or may operate in many local authorities. Quite commonly, poorly-performing schools in a local authority, or stand-alone academies, may be offered to a successful and capable multi-academy trust, and these hand-overs explain much of the growth in multi-academy trusts in recent years.

What I saw in the STEP and Unity Trusts was something extraordinary. In each case, the exceptional schools I saw were serving as lead schools for the dissemination of Success for All. Staff in these schools had an explicit responsibility to train and mentor future principals, facilitators, and teachers, who spend a year at the lead school learning about SFA and their role in it, and then taking on their roles in a new SFA school elsewhere in the multi-academy trust. Over time, there are multiple lead schools, each of which takes responsibility to mentor new SFA schools other than their own. This cascading dissemination strategy, carried out in close partnership with the national SFA-UK non-profit organization, is likely to produce exceptional implementations.

I’m sure there must be problems with multi-academy trusts that I don’t know about, and in the absence of data on MATs throughout Britain, I would not take a position on them in general. But based on my limited experience with the STEP and Unity Trusts, this policy has particular potential as a means of disseminating very effective forms of programs proven effective in rigorous research.

First, multi-academy trusts have the opportunity and motivation to establish themselves as effective. Ordinary U.S. districts want to do well, of course, but they do not grow (or shrink) because of their success (or lack of it). In contrast, a multi-academy trust in the U.K. is more likely to seek out proven programs and implement them with care and competence, both to increase student success and to establish a “brand” based on their effective use of proven programs. Both STEP and Unity Trusts are building a reputation for succeeding with difficult schools using methods known to be effective. Using cascading professional developing and mentoring from established schools to new ones, a multi-academy trust can build effectiveness and reputation.

Although the schools I saw were using Success for All, any multi-academy trust could use any proven program or programs to create positive outcomes and expand its reach and influence. As other multi-academy trusts see what the pioneers are accomplishing, they may decide to emulate them. One major advantage possessed by multi-academy trusts is that much in contrast to U.S. school districts, especially large, urban ones, multi-academy trusts are likely to remain under consistent leadership for many years. Leaders of multi-academy trusts, and their staff and supporters, are likely to have time to transform practices gradually over time, knowing that they have the stable leadership needed for long-term change.

There is no magic in school governance arrangements, and no guarantee that many multi-academy trusts will use the available opportunities to implement and perfect proven strategies. Yet by their nature, multi-academy trusts have the opportunity to make a substantial difference in the education provided to all students, especially those serving disadvantaged students. I look forward to watching plans unfold in the STEP and Unity Trusts, and to learn more about how the academy movement in the U.K. might provide a path toward widespread and thoughtful use of proven programs, benefiting very large numbers of students. And I’d love to see more U.S. charter networks and traditional school districts use cascading replication to scale up proven, whole-school approaches likely to improve outcomes in disadvantaged schools.

Photo credit: Kindermel [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)]

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Achieving Audacious Goals in Education: Amundson and the Fram

On a recent trip to Norway, I visited the Fram Museum in Oslo. The Fram was Roald Amundson’s ship, used to transport a small crew to the South Pole in 1911. The museum is built around the Fram itself, and visitors can go aboard this amazing ship, surrounded by information and displays about polar exploration. What was most impressive about the Fram is the meticulous attention to detail in every aspect of the expedition. Amundson had undertaken other trips to the polar seas to prepare for his trip, and had carefully studied the experiences of other polar explorers. The ship’s hull was special built to withstand crushing from the shifting of polar ice. He carried many huskies to pull sleds over the ice, and trained them to work in teams.. Every possible problem was carefully anticipated in light of experience, and exact amounts of food for men and dogs were allocated and stored. Amundson said that forgetting “a single trouser button” could doom the effort. As it unfolded, everything worked as anticipated, and all the men and dogs returned safely after reaching the South Pole.

blog_12-5-19_Amundsen_500x361
From At the South Pole by Roald Amundsen, 1913 [Public domain]
The story of Amundson and the Fram is an illustration of how to overcome major obstacles to achieve audacious goals. I’d like to build on it to return to a topic I’ve touched on in two previous blogs. The audacious goal: Overcoming the substantial gap in elementary reading achievement between students who qualify for free lunch and those who do not, between African American and White students, and between Hispanic and non-Hispanic students. According to the National Assessment of Educational Progress (NAEP), each of these gaps is about one half of a standard deviation, also known as an effect size of +0.50. This is a very large gap, but it has been overcome in a very small number of intensive programs. These programs were able to increase the achievement of disadvantaged students by an effect size of more than +0.50, but few were able to reproduce these gains under normal circumstances. Our goal is to enable thousands of ordinary schools serving disadvantaged students to achieve such outcomes, at a cost of no more than 5% beyond ordinary per-pupil costs.

Educational Reform and Audacious Goals

Researchers have long been creating and evaluating many different approaches to improving reading achievement. This is necessary in the research and development process to find “what works” and build up from there. However, each individual program or practice has a modest effect on key outcomes, and we rarely combine proven programs to achieve an effect large enough to, for example, overcome the achievement gap. This is not what Amundson, or the Wright Brothers, or the worldwide team that achieved eradication of smallpox did. Instead, they set audacious goals and kept at them systematically, using what works, until they were achieved.

I would argue that we should and could do the same in education. The reading achievement gap is the largest problem of educational practice and policy in the U.S. We need to use everything we know how to do to solve it. This means stating in advance that our goal is to find strategies capable of eliminating reading gaps at scale, and refusing to declare victory until this goal is achieved. We need to establish that the goal can be achieved, by ordinary teachers and principals in ordinary schools serving disadvantaged students.

Tutoring Our Way to the Goal

In a previous blog I proposed that the goal of +0.50 could be reached by providing disadvantaged, low-achieving students tutoring in small groups or, when necessary, one-to-one. As I argued there and elsewhere, there is no reading intervention as effective as tutoring. Recent reviews of research have found that well-qualified teaching assistants using proven methods can achieve outcomes as good as those achieved by certified teachers working as tutors, thereby making tutoring much less expensive and more replicable (Inns et al., 2019). Providing schools with significant numbers of well-trained tutors is one likely means of reaching ES=+0.50 for disadvantaged students. Inns et al. (2019) found an average effect size of +0.38 for tutoring by teaching assistants, but several programs had effect sizes of +0.40 to +0.47. This is not +0.50, but it is within striking distance of the goal. However, each school would need multiple tutors in order to provide high-quality tutoring to most students, to extend the known positive effects of tutoring to the whole school.

Combining Intensive Tutoring With Success for All

Tutoring may be sufficient by itself, but research on tutoring has rarely used tutoring schoolwide, to benefit all students in high-poverty schools. It may be more effective to combine widespread tutoring for students who most need it with other proven strategies designed for the whole school, rather than simply extending a program designed for individuals and small groups. One logical strategy to reach the goal of +0.50 in reading might be to combine intensive tutoring with our Success for All whole-school reform model.

Success for All adds to intensive tutoring in several ways. It provides teachers with professional development on proven reading strategies, as well as cooperative learning and classroom management strategies at all levels. Strengthening core reading instruction reduces the number of children at great risk, and even for students who are receiving tutoring, it provides a setting in which students can apply and extend their skills. For students who do not need tutoring, Success for All provides acceleration. In high-poverty schools, students who are meeting reading standards are likely to still be performing below their potential, and improving instruction for all is likely to help these students excel.

Success for All was created in the late 1980s in an attempt to achieve a goal similar to the +0.50 challenge. In its first major evaluation, a matched study in six high-poverty Baltimore elementary schools, Success for All achieved a schoolwide reading effect size of at least +0.50 schoolwide in grades 1-5 on individually administered reading measures. For students in the lowest 25% of the sample at pretest, the effect size averaged +0.75 (Madden et al., 1993). That experiment provided two to six certified teacher tutors per school, who worked one to one with the lowest-achieving first and second graders. The tutors supplemented a detailed reading program, which used cooperative learning, phonics, proven classroom management methods, parent involvement, frequent assessment, distributed leadership, and other elements (as Success for All still does).

An independent follow-up assessment found that the effect maintained to the eighth grade, and also showed a halving of retentions in grade and a halving of assignments to special education, compared to the control group (Borman & Hewes, 2002). Schools using Success for All since that time have rarely been able to afford so many tutors, instead averaging one or two tutors. Many schools using SFA have not been able to afford even one tutor. Still, across 28 qualifying studies, mostly by third parties, the Success for All effect size has averaged +0.27 (Cheung et al., in press). This is impressive, but it is not +0.50. For the lowest achievers, the mean effect size was +0.62, but again, our goal is +0.50 for all disadvantaged students, not just the lowest achievers.

Over a period of years, could schools using Success for All with five or more teaching assistant tutors reach the +0.50 goal? I’m certain of it. Could we go even further, perhaps creating a similar approach for secondary schools or adding in an emphasis on mathematics? That would be the next frontier.

The Policy Importance of +0.50

If we can routinely achieve an effect size of +0.50 in reading in most Title I schools, this would provide a real challenge for policy makers. Many policy makers argue that money does not make much difference in education, or that housing, employment, and other basic economic improvements are needed before major improvements in the education of disadvantaged students will be possible. But what if it became widely known that outcomes in high-poverty schools could be reliably and substantially improved at a modest cost, compared to the outcomes? Policy makers would hopefully focus on finding ways to provide the resources needed if they could be confident in the outcomes.

As Amundson knew, difficult goals can be attained with meticulous planning and high-quality implementation. Every element of his expedition had been tested extensively in real arctic conditions, and had been found to be effective and practical. We would propose taking a similar path to universal success in reading. Each component of a practical plan to reach an effect size of +0.50 or more must be proven to be effective in schools serving many disadvantaged students. Combining proven approaches, we can add sufficiently to the reading achievement of disadvantaged schools to enable them to perform as well as middle class students do. It just takes an audacious goal and the commitment and resources to accomplish it.

References

Borman, G., & Hewes, G. (2002).  Long-term effects and cost effectiveness of Success for All.  Educational Evaluation and Policy Analysis, 24 (2), 243-266.

Cheung, A., Xie, C., Zhang, T., & Slavin, R. E. (in press). Success for All: A quantitative synthesis of evaluations. Education Research Review.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Available at www.bestevidence.org. Manuscript submitted for publication.

Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L., & Wasik, B. (1993). Success for All:  Longitudinal effects of a schoolwide elementary restructuring program. American Educational Reseach Journal, 30, 123-148.

Madden, N. A., & Slavin, R. E. (2017). Evaluations of technology-assisted small-group tutoring for struggling readers. Reading & Writing Quarterly, 1-8. http://dx.doi.org/10.1080/10573569.2016.1255577

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

On Replicability: Why We Don’t Celebrate Viking Day

I was recently in Oslo, Norway’s capital, and visited a wonderful museum displaying three Viking ships that had been buried with important people. The museum had all sorts of displays focused on the amazing exploits of Viking ships, always including the Viking landings in Newfoundland, about 500 years before Columbus. Since the 1960s, most people have known that Vikings, not Columbus, were the first Europeans to land in America. So why do we celebrate Columbus Day, not Viking Day?

Given the bloodthirsty actions of Columbus, easily rivaling those of the Vikings, we surely don’t prefer one to the other based on their charming personalities. Instead, we celebrate Columbus Day because what Columbus did was far more important. The Vikings knew how to get back to Newfoundland, but they were secretive about it. Columbus was eager to publicize and repeat his discovery. It was this focus on replication that opened the door to regular exchanges. The Vikings brought back salted cod. Columbus brought back a new world.

In educational research, academics often imagine that if they establish new theories or demonstrate new methods on a small scale, and then publish their results in reputable journals, their job is done. Call this the Viking model: they got what they wanted (promotions or salt cod), and who cares if ordinary people found out about it? Even if the Vikings had published their findings in the Viking Journal of Exploration, this would have had roughly the same effect as educational researchers publishing in their own research journals.

Columbus, in contrast, told everyone about his voyages, and very publicly repeated and extended them. His brutal leadership ended with him being sent back to Spain in chains, but his discoveries had resounding impacts that long outlived him.

blog_11-21-19_vikingship_500x374

Educational researchers only want to do good, but they are unlikely to have any impact at all unless they can make their ideas useful to educators. Many educational researchers would love to make their ideas into replicable programs, evaluate these programs in schools, and if they are found to be effective, disseminate them broadly. However, resources for the early stages of development and research are scarce. Yes, the Institute of Education Sciences (IES) and Education Innovation Research (EIR) fund a lot of development projects, and Small Business Innovation Research (SBIR) provides small grants for this purpose to for-profit companies. Yet these funders support only a tiny proportion of the proposals they receive. In England, the Education Endowment Foundation (EEF) spends a lot on randomized evaluations of promising programs, but very little on development or early-stage research. Innovations that are funded by government or other funding very rarely end up being evaluated in large experiments, fewer still are found to be effective, and vanishingly few eventually enter widespread use. The exceptions are generally programs crated by large for-profit companies, large and entrepreneurial non-profits, or other entities with proven capacity to develop, evaluate, support, and disseminate programs at scale. Even the most brilliant developers and researchers rarely have the interest, time, capital, business expertise, or infrastructure to nurture effective programs through all the steps necessary to bring a practical and effective program to market. As a result, most educational products introduced at scale to schools come from commercial publishers or software companies, who have the capital and expertise to create and disseminate educational programs, but serve a market that primarily wants attractive, inexpensive, easy-to-use materials, software, and professional development, and is not (yet) willing to pay for programs proven to be effective. I discussed this problem in a recent blog on technology, but the same dynamics apply to all innovations, tech and non-tech alike.

How Government Can Promote Proven, Replicable Programs

There is an old saying that Columbus personified the spirit of research. He didn’t know where he was going, he didn’t know where he was when he got there, and he did it all on government funding. The relevant part of this is the government funding. In Columbus’ time, only royalty could afford to support his voyage, and his grant from Queen Isabella was essential to his success. Yet Isabella was not interested in pure research. She was hoping that Columbus might open rich trade routes to the (east) Indies or China, or might find gold or silver, or might acquire valuable new lands for the crown (all of these things did eventually happen). Educational research, development, and dissemination face a similar situation. Because education is virtually a government monopoly, only government is capable of sustained, sizable funding of research, development, and dissemination, and only the U.S. government has the acknowledged responsibility to improve outcomes for the 50 million American children ages 4-18 in its care. So what can government do to accelerate the research-development-dissemination process?

  1. Contract with “seed bed” organizations capable of identifying and supporting innovators with ideas likely to make a difference in student learning. These organizations might be rewarded, in part, based on the number of proven programs they are able to help create, support, and (if effective) ultimately disseminate.
  2. Contract with independent third-party evaluators capable of doing rigorous evaluations of promising programs. These organizations would evaluate promising programs from any source, not just from seed bed companies, as they do now in IES, EIR, and EEF grants.
  3. Provide funding for innovators with demonstrated capacity to create programs likely to be effective and funding to disseminate them if they are proven effective. Developers may also contract with “seed bed” organizations to help program developers succeed with development and dissemination.
  4. Provide information and incentive funding to schools to encourage them to adopt proven programs, as described in a recent blog on technology.  Incentives should be available on a competitive basis to a broad set of schools, such as all Title I schools, to engage many schools in adoption of proven programs.

Evidence-based reform in education has made considerable progress in the past 15 years, both in finding positive examples that are in use today and in finding out what is not likely to make substantial differences. It is time for this movement to go beyond its early achievements to enter a new phase of professionalism, in which collaborations among developers, researchers, and disseminators can sustain a much faster and more reliable process of research, development, and dissemination. It’s time to move beyond the Viking stage of exploration to embrace the good parts of the collaboration between Columbus and Queen Isabella that made a substantial and lasting change in the whole world.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.