Effect Sizes and Additional Months of Gain: Can’t We Just Agree That More is Better?

In the 1984 mockumentary This is Spinal Tap, there is a running joke about a hapless band, Spinal Tap, which proudly bills itself “Britain’s Loudest Band.”  A pesky reporter keeps asking the band’s leader, “But how can you prove that you are Britain’s loudest band?” The band leader explains, with declining patience, that while ordinary amplifiers’ sound controls only go up to 10, Spinal Tap’s go up to 11.  “But those numbers are arbitrary,” says the reporter.  “They don’t mean a thing!”  “Don’t you get it?” asks the band leader.  “ELEVEN is more than TEN!  Anyone can see that!”

In educational research, we have an ongoing debate reminiscent of Spinal Tap.  Educational researchers speaking to other researchers invariably express the impact of educational treatments as effect sizes (the difference in adjusted means for the experimental and control groups divided by the unadjusted standard deviation).  All else being equal, higher effect sizes are better than lower ones.

However, educators who are not trained in statistics often despise effect sizes.  “What do they mean?” they ask.  “Tell us how much difference the treatment makes in student learning!”

Researchers want to be understood, so they try to translate effect sizes into more educator-friendly equivalents.  The problem is that the friendlier the units, the more statistically problematic they are.  The friendliest of all is “additional months of learning.”  Researchers or educators can look on a chart and, for any particular effect size, they can find the number of “additional months of learning.”  The Education Endowment Foundation in England, which funds and reports on rigorous experiments, reports both effect sizes and additional months of learning, and provides tables to help people make the conversion.  But here’s the rub.  A recent article by Baird & Pane (2019) compared additional months of learning to three other translations of effect sizes.  Additional months of learning was rated highest in ease of use, but lowest in four other categories, such as transparency and consistency. For example, a month of learning clearly has a different meaning in kindergarten than it does in tenth grade.

The other translations rated higher by Baird and Pane were, at least to me, just as hard to understand as effect sizes.  For example, the What Works Clearinghouse presents, along with effect sizes, an “improvement index” that has the virtue of being equally incomprehensible to researchers and educators alike.

On one hand, arguing about outcome metrics is as silly as arguing the relative virtues of Fahrenheit and Celsius. If they can be directly transformed into the other unit, who cares?

However, additional months of learning is often used to cover up very low effect sizes. I recently ran into an example of this in a series of studies by the Stanford Center for Research on Education Outcomes (CREDO), in which disadvantaged urban African American students gained 59 more “days of learning” than matched students not in charters in math, and 44 more days in reading. These numbers were cited in an editorial praising charter schools in the May 29 Washington Post.

However, these “days of learning” are misleading. The effect size for this same comparison was only +0.08 for math, and +0.06 for reading. Any researcher will tell you that these are very small effects. They were only made to look big by reporting the gains in days. These not only magnify the apparent differences, but they also make them unstable. Would it interest you to know that White students in urban charter schools performed 36 days a year worse than matched students in math (ES= -0.05) and 14 days worse in reading (ES= -0.02)? How about Native American students in urban charter schools, whose scores were 70 days worse than matched students in non-charters in math (ES= -0.10), and equal in reading. I wrote about charter school studies in a recent blog. In the blog, I did not argue that charter schools are effective for disadvantaged African Americans but harmful for Whites and Native Americans. That seems unlikely. What I did argue is that the effects of charter schools are so small that the directions of the effects are unstable. The overall effects across all urban schools studied were only 40 days (ES=+0.055) in math and 28 days (ES=+0.04) in reading. These effects look big because of the “days of learning” transformation, but they are not.

blog_6-13-19_volume_500x375In This is Spinal Tap, the argument about whether or not Spinal Tap is Britain’s loudest band is absurd.  Any band can turn its amplifiers to the top and blow out everyone’s eardrums, whether the top is marked eleven or ten.  In education, however, it does matter a great deal that educators are taking evidence into account in their decisions about educational programs. Using effect sizes, perhaps supplemented by additional months of learning, is one way to help readers understand outcomes of educational experiments. Using “days of learning,” however, is misleading, making very small impacts look important. Why not additional hours or minutes of learning, while we’re at it? Spinal Tap would be proud.

References

Baird, M., & Paine, J. (2019). Translating standardized effects of education programs into more interpretable metrics. Educational Researcher. Advance online publication. doi.org/10.3102/0013189X19848729

CREDO (2015). Overview of the Urban Charter School Study. Stanford, CA: Author.

Washington Post: Denying poor children a chance. [Editorial]. (May 29, 2019). The Washington Post, A16.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Charter Schools? Smarter Schools? Why Not Both?

I recently saw an editorial in the May 29 Washington Post, entitled “Denying Poor Children a Chance,” a pro-charter school opinion piece that makes dire predictions about the damage to poor and minority students that would follow if charter expansion were to be limited.  In education, it is common to see evidence-free opinions for and against charter schools, so I was glad to see actual data in the Post editorial.   In my view, if charter schools could routinely and substantially improve student outcomes, especially for disadvantaged students, I’d be a big fan.  My response to charter schools is the same as my response to everything else in education: Show me the evidence.

The Washington Post editorial cited a widely known 2015 Stanford CREDO study comparing urban charter schools to matched traditional public schools (TPS) in the same districts.  Evidence always attracts my attention, so I decided to look into this and other large, multi-district studies. Despite the Post’s enthusiasm for the data, the average effect size was only +0.055 for math and +0.04 for reading.  By anyone’s standards, these are very, very small outcomes.  Outcomes for poor, urban, African American students were somewhat higher, at +0.08 for math and +0.06 for reading, but on the other hand, average effect sizes for White students were negative, averaging -0.05 for math and -0.02 for reading.  Outcomes were also negative for Native American students: -0.10 for math, zero for reading.  With effect sizes so low, these small differences are probably just different flavors of zero.  A CREDO (2013) study of charter schools in 27 states, including non-urban as well as urban schools, found average effect sizes of +0.01 for math and -0.01 for reading. How much smaller can you get?

In fact, the CREDO studies have been widely criticized for using techniques that inflate test scores in charter schools.  They compare students in charter schools to students in traditional public schools, matching on pretests and ethnicity.  This ignores the obvious fact that students in charter schools chose to go there, or their parents chose for them to go.  There is every reason to believe that students who choose to attend charter schools are, on average, higher-achieving, more highly motivated, and better behaved than students who stay in traditional public schools.  Gleason et al. (2010) found that students who applied to charter schools started off 16 percentage points higher in reading and 13 percentage points higher in math than others in the same schools who did not apply.  Applicants were more likely to be White and less likely to be African American or Hispanic, and they were less likely to qualify for free lunch.  Self-selection is a particular problem in studies of students who choose or are sent to “no-excuses” charters, such as KIPP or Success Academies, because the students or their parents know students will be held to very high standards of behavior and accomplishment, and may be encouraged to leave the school if they do not meet those standards (this is not a criticism of KIPP or Success Academies, but when such charter systems use lotteries to select students, the students who show up for the lotteries were at least motivated to participate in a lottery to attend a very demanding school).

Well-designed studies of charter schools usually focus on schools that use lotteries to select students, and then they compare the students who were successful in the lottery to those who were not so lucky.  This eliminates the self-selection problem, as students were selected by a random process.  The CREDO studies do not do this, and this may be why their studies report higher (though still very small) effect sizes than those reported by syntheses of studies of students who all applied to charters, but may have been “lotteried in” or “lotteried out” at random.  A very rigorous WWC synthesis of such studies by Gleason et al. (2010) found that middle school students who were lotteried into charter schools in 32 states performed non-significantly worse than those lotteried out, in math (ES=-0.06) and in reading (ES=-0.08).  A 2015 update of the WWC study found very similar, slightly negative outcomes in reading and math.

It is important to note that “no-excuses” charter schools, mentioned earlier, have had more positive outcomes than other charters.  A recent review of lottery studies by Cheng et al. (2017) found effect sizes of +0.25 for math and +0.17 for reading.  However, such “no-excuses” charters are a tiny percentage of all charters nationwide.

blog_6-5-19_schoolmortorbd_500x422

Other meta-analyses of studies of achievement outcomes of charter schools also exist, but none found effect sizes as high as the CREDO urban study.  The means of +0.055 for math and +0.04 for reading represent upper bounds for effects of urban charter schools.

Charter Schools or Smarter Schools?

So far, every study of achievement effects of charters has focused on impacts of charters on achievement compared to those of traditional public schools.  However, this should not be the only question.  “Charters” and “non-charters” do not exhaust the range of possibilities.

What if we instead ask this question: Among the range of programs available, which are most likely to be most effective at scale?

To illustrate the importance of this question, consider a study in England, which evaluated a program called Engaging Parents Through Mobile Phones.  The program involves texting parents on cell phones to alert them to upcoming tests, inform them about whether students are completing their homework, and tell them what students were being taught in school.  A randomized evaluation (Miller et al, 2017) found effect sizes of +0.06 for math and +0.03 for reading, remarkably similar to the urban charter school effects reported by CREDO (2015).  The cost of the mobile phone program was £6 per student per year, or $7.80.  If you like the outcomes of charter schools, might you prefer to get the same outcomes for $7.80 per child per year, without all the political, legal, and financial stresses of charter schools?

The point here is that rather than arguing about the size of small charter effects, one could consider charters a “treatment” and compare them to other proven approaches.  In our Evidence for ESSA website, we list 112 reading and math programs that meet ESSA standards for “Strong,” “Moderate,” or “Promising” evidence of effectiveness.  Of these, 107 had effect sizes larger than those CREDO (2015) reports for urban charter schools.  In both math and reading, there are many programs with average effect sizes of +0.20, +0.30, up to more than +0.60.  If applied as they were in the research, the best of these programs could, for example, entirely overcome Black-White and Hispanic-White achievement gaps in one or two years.

A few charter school networks have their own proven educational approaches, but the many charters that do not have proven programs should be looking for them.  Most proven programs work just as well in charter schools as they do in traditional public schools, so there is no reason existing charter schools should not proactively seek proven programs to increase their outcomes.  For new charters, wouldn’t it make sense for chartering agencies to encourage charter applicants to systematically search for and propose to adopt programs that have strong evidence of effectiveness?  Many charter schools already use proven programs.  In fact, there are several that specifically became charters to enable them to adopt or maintain our Success for All whole-school reform program.

There is no reason for any conflict between charter schools and smarter schools.  The goal of every school, regardless of its governance, should be to help students achieve their full potential, and every leader of a charter or non-charter school would agree with this. Whatever we think about governance, all schools, traditional or charter, should get smarter, using proven programs of all sorts to improve student outcomes.

References

Cheng, A., Hitt, C., Kisida, B., & Mills, J. N. (2017). “No excuses” charter schools: A meta-analysis of the experimental evidence on student achievement. Journal of School Choice, 11 (2), 209-238.

Clark, M.A., Gleason, P. M., Tuttle, C. C., & Silverberg, M. K., (2015). Do charter schools improve student achievement? Educational Evaluation and Policy Analysis, 37 (4), 419-436.

Gleason, P.M., Clark, M. A., Tuttle, C. C., & Dwoyer, E. (2010).The evaluation of charter school impacts. Washington, DC: What Works Clearinghouse.

Miller, S., Davison, J, Yohanis, J., Sloan, S., Gildea, A., & Thurston, A. (2016). Texting parents: Evaluation report and executive summary. London: Education Endowment Foundation.

Washington Post: Denying poor children a chance. [Editorial]. (May 29, 2019). The Washington Post, A16.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Can Computers Teach?

Something’s coming

I don’t know

What it is

But it is

Gonna be great!

-Something’s Coming, West Side Story

For more than 40 years, educational technology has been on the verge of transforming educational outcomes for the better. The song “Something’s Coming,” from West Side Story, captures the feeling. We don’t know how technology is going to solve our problems, but it’s gonna be great!

Technology Counts is an occasional section of Education Week. Usually, it publishes enthusiastic predictions about the wonders around the corner, in line with its many advertisements for technology products of all kinds. So it was a bit of a shock to see the most recent edition, dated April 24. An article entitled, “U.S. Teachers Not Seeing Tech Impact,” by Benjamin Herold, reported a nationally representative survey of 700 teachers. They reported huge purchases of digital devices, software, learning apps, and other technology in the past three years. That’s not news, if you’ve been in schools lately. But if you think technology is doing “a lot” to support classroom innovation, you’re out of step with most of the profession. Only 29% of teachers would agree with you, but 41% say “some,” 26% “a little,” and 4% “none.” Equally modest proportions say that technology has “changed their work as a teacher.” The Technology Counts articles describe most teachers as using technology to help them do what they have always done, rather than to innovate.

There are lots of useful things technology is used for, such as teaching students to use computers, and technology may make some tasks easier for teachers and students. But from their earliest beginnings, everyone hoped that computers would help students learn traditional subjects, such as reading and math. Do they?

blog_5-16-19_kidscomputers_500x333

The answer is, not so much. The table below shows average effect sizes for technology programs in reading and math, using data from four recent rigorous reviews of research. Three of these have been posted at www.bestevidence.org. The fourth, on reading strategies for all students, will be posted in the next few weeks.

Mean Effect Sizes for Applications of Technology in Reading and Mathematics
Number of Studies Mean Effect Size
Elementary Reading 16 +0.09
Elementary Reading – Struggling Readers 6 +0.05
Secondary Reading 23 +0.08
Elementary Mathematics 14 +0.07
Study-Weighted Mean 59 +0.08

An effect size of +0.08, which is the average across the four reviews, is not zero. But it is not much. It is certainly not revolutionary. Also, the effects of technology are not improving over time.

As a point of comparison, average effect sizes for tutoring by teaching assistants have the following effect sizes:

Number of Studies Mean Effect Size
Elementary Reading – Struggling Readers 7 +0.34
Secondary Reading 2 +0.23
Elementary Mathematics 10 +0.27
Study-Weighted Mean 19 +0.29

Tutoring by teaching assistants is more than 3 ½ times as effective as technology. Yet the cost differences between tutoring and technology, especially for effective one-to-small group tutoring by teaching assistants, is not much.

Tutoring is not the only effective alternative to technology. Our reviews have identified many types of programs that are more effective than technology.

A valid argument for continuing with use of technology is that eventually, we are bound to come up with more effective technology strategies. It is certainly worthwhile to keep experimenting. But this argument has been made since the early 1970s, and technology is still not ready for prime time, as least as far as teaching reading and math are concerned. I still believe that technology’s day will come, when strategies to get the best from both teachers and technology will reliably be able to improve learning. Until then, let’s use programs and practices already proven to be effective, as we continue to work to improve the outcomes of technology.

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Could Proven Programs Eliminate Gaps in Elementary Reading Achievement?

What if every child in America could read at grade level or better? What if the number of students in special education for learning disabilities, or retained in grade, could be cut in half?

What if students who become behavior problems or give up on learning because of nothing more than reading difficulties could instead succeed in reading and no longer be frustrated by failure?

Today these kinds of outcomes are only pipe dreams. Despite decades of effort and billions of dollars directed toward remedial and special education, reading levels have barely increased.  Gaps between middle class and economically disadvantaged students remain wide, as do gaps between ethnic groups. We’ve done so much, you might think, and nothing has really worked at scale.

Yet today we have many solutions to the problems of struggling readers, solutions so effective that if widely and effectively implemented, they could substantially change not only the reading skills, but the life chances of students who are struggling in reading.

blog_4-25-19_teacherreading_500x333

How do I know this is possible? The answer is that the evidence is there for all to see.

This week, my colleagues and I released a review of research on programs for struggling readers. The review, written by Amanda Inns, Cynthia Lake, Marta Pellegrini, and myself, uses academic language and rigorous review methods. But you don’t have to be a research expert to understand what we found out. In ten minutes, just reading this blog, you will know what needs to be done to have a powerful impact on struggling readers.

Everyone knows that there are substantial gaps in student reading performance according to social class and race. According to the National Assessment of Educational Progress, or NAEP, here are key gaps in terms of effect sizes at fourth grade:

Gap in Effect Sizes
No Free/Reduced lunch/

Free/Reduced lunch

0.56
White/African American 0.52
White/Hispanic 0.46

These are big differences. In order to eliminate these gaps, we’d have to provide schools serving disadvantaged and minority students with programs or services sufficient to increase their reading scores by about a half standard deviation. Is this really possible?

Can We Really Eliminate Such Big and Longstanding Gaps?

Yes, we can. And we can do it cost-effectively.

Our review examined thousands of studies of programs intended to improve the reading performance of struggling readers. We found 59 studies of 39 different programs that met very high standards of research quality. 73% of the qualifying studies used random assignment to experimental or control groups, just as the most rigorous medical studies do. We organized the programs into response to intervention (RTI) tiers:

Tier 1 means whole-class programs, not just for struggling readers

Tier 2 means targeted services for students who are struggling to read

Tier 3 means intensive services for students who have serious difficulties.

Our categories were as follows:

Multi-Tier (Tier 1 + tutoring for students who need it)

Tier 1:

  • Whole-class programs

Tier 2:

  • Technology programs
  • One-to-small group tutoring

Tier 3:

  • One-to-one tutoring

We are not advocating for RTI itself, because the data on RTI are unclear. But it is just common sense to use proven programs with all students, then proven remedial approaches with struggling readers, then intensive services for students for whom Tier 2 is not sufficient.

Do We Have Proven Programs Able to Overcome the Gaps?

The table below shows average effect sizes for specific reading approaches. Wherever you see effect sizes that approach or exceed +0.50, you are looking at proven solutions to the gaps, or at least programs that could become a component in a schoolwide plan to ensure the success of all struggling readers.

Programs That Work for Struggling Elementary Readers

Multi-Tier Approaches Grades Proven No. of Studies Mean Effect Size
      Success for All K-5 3 +0.35
      Enhanced Core Reading Instruction 1 1 +0.24
Tier 1 – Classroom Approaches      
     Cooperative Integrated Reading                        & Composition (CIRC) 2-6 3 +0.11
      PALS 1 1 +0.65
Tier 2 – One-to-Small Group Tutoring      
      Read, Write, & Type (T 1-3) 1 1 +0.42
      Lindamood (T 1-3) 1 1 +0.65
      SHIP (T 1-3) K-3 1 +0.39
      Passport to Literacy (TA 1-4/7) 4 4 +0.15
      Quick Reads (TA 1-2) 2-3 2 +0.22
Tier 3 One-to-One Tutoring
      Reading Recovery (T) 1 3 +0.47
      Targeted Reading Intervention (T) K-1 2 +0.50
      Early Steps (T) 1 1 +0.86
      Lindamood (T) K-2 1 +0.69
      Reading Rescue (T or TA) 1 1 +0.40
      Sound Partners (TA) K-1 2 +0.43
      SMART (PV) K-1 1 +0.40
      SPARK (PV) K-2 1 +0.51

Key:    T: Certified teacher tutors

TA: Teaching assistant tutors

PV: Paid volunteers (e.g., AmeriCorps members)

1-X: For small group tutoring, the usual group size for tutoring (e.g., 1-2, 1-4)

(For more information on each program, see www.evidenceforessa.org)

The table is a road map to eliminating the achievement gaps that our schools have wrestled with for so long. It only lists programs that succeeded at a high level, relative to others at the same tier levels. See the full report or www.evidenceforessa for information on all programs.

It is important to note that there is little evidence of the effectiveness of tutoring in grades 3-5. Almost all of the evidence is from grades K-2. However, studies done in England in secondary schools have found positive effects of three reading tutoring programs in the English equivalent of U.S. grades 6-7. These findings suggest that when well-designed tutoring programs for grades 3-5 are evaluated, they will also show very positive impacts. See our review on secondary reading programs at www.bestevidence.org for information on these English middle school tutoring studies. On the same website, you can also see a review of research on elementary mathematics programs, which reports that most of the successful studies of tutoring in math took place in grades 2-5, another indicator that reading tutoring is also likely to be effective in these grades.

Some of the individual programs have shown effects large enough to overcome gaps all by themselves if they are well implemented (i.e., ES = +0.50 or more). Others have effect sizes lower than +0.50 but if combined with other programs elsewhere on the list, or if used over longer time periods, are likely to eliminate gaps. For example, one-to-one tutoring by certified teachers is very effective, but very expensive. A school might implement a Tier 1 or multi-tier approach to solve all the easy problems inexpensively, then use cost-effective one-to-small group methods for students with moderate reading problems, and only then use one-to-one tutoring with the small number of students with the greatest needs.

Schools, districts, and states should consider the availability, practicality, and cost of these solutions to arrive at a workable solution. They then need to make sure that the programs are implemented well enough and long enough to obtain the outcomes seen in the research, or to improve on them.

But the inescapable conclusion from our review is that the gaps can be closed, using proven models that already exist. That’s big news, news that demands big changes.

Photo credit: Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Benchmark Assessments: Weighing the Pig More Often?

There is an old saying about educational assessment: “If you want to fatten a pig, it doesn’t help to weigh it more often.”

To be fair, it may actually help to weigh pigs more often, so the farmer knows whether they are gaining weight at the expected levels. Then they can do something in time if this is not the case.

It is surely correct that weighing pigs does no good in itself, but it may serve a diagnostic purpose. What matters is not the weighing, but rather what the farmer or veterinarian does based on the information provided by the weighing.

blog_4-11-19_pigscale_500x432

This blog is not, however, about porcine policy, but educational policy. In schools, districts, and even whole states, most American children take “benchmark assessments” roughly three to six times a year. These assessments are intended to tell teachers, principals, and other school leaders how students are doing, especially in reading and math. Ideally, benchmark assessments are closely aligned with state accountability tests, making it possible for school leaders to predict how whole grade levels are likely to do on the state tests early enough in the year to enable them to provide additional assistance in areas of need. The information might be as detailed as “fourth graders need help in fractions” or “English learners need help in vocabulary.”

Benchmark assessments are only useful if they improve scores on state accountability tests. Other types of intervention may be beneficial even if they do not make any difference in state test scores, but it is hard to see why benchmark assessments would be valuable if they do not in fact have any impact on state tests, or other standardized tests.

So here is the bad news: Research finds that benchmark assessments do not make any difference in achievement.

High-quality, large scale randomized evaluations of benchmark assessments are relatively easy to do. Many have in fact been done. Use of benchmark assessments have been evaluated in elementary reading and math (see www.bestevidence.org). Here is a summary of the findings.

Number of Studies Mean Effect Size
Elementary Reading 6 -0.02
Elementary Math 4    .00
Study-weighted mean 10 -0.01

In a rational world, these findings would put an end to benchmark assessments, at least as they are used now. The average outcomes are not just small, they are zero. They use up a lot of student time and district money.

In our accountability-obsessed educational culture, how could use of benchmark assessments make no difference at all on the only measure they are intended to improve? I would suggest several possibilities.

First, perhaps the most likely, is that teachers and schools do not do much with the information from benchmark assessments. If you are trying to lose weight, you likely weigh yourself every day. But if you then make no systematic effort to change your diet or increase your exercise, then all those weighings are of little value. In education, the situation is much worse than in weight reduction, because teachers are each responsible for 20-30 students. Results of benchmark assessments are different for each student, so a school staff that learns that its fourth graders need improvement in fractions finds it difficult to act on this information. Some fourth graders in every school are excelling in fractions, some just need a little help, and some are struggling in fractions because they missed the prerequisite skills. “Teach more fractions” is not a likely solution except for some of that middle group, yet differentiating instruction for all students is difficult to do well.

Another problem is that it takes time to score and return benchmark assessments, so by the time a team of teachers decides how to respond to benchmark information, the situation has moved on.

Third, benchmark assessments may add little because teachers and principals already know a lot more about their students than any test can tell them. Imagine a principal receiving the information that her English learners need help in vocabulary. I’m going to guess that she already knows that. But more than that, she and her teachers know which English learners need what kind of vocabulary, and they have other measures and means of finding out. Teachers already give a lot of brief, targeted curriculum-linked assessments, and they always have. Further, wise teachers stroll around and listen in on students working in cooperative groups, or look at their tests or seatwork or progress on computer curriculum, to get a sophisticated understanding of why some students are having trouble, and ideas for what to do about it. For example, it is possible that English learners are lacking school-specific vocabulary, such as that related to science or social studies, and this observation may suggest solutions (e.g., teach more science and social studies). But what if some English learners are afraid or unwilling to express themselves in class, but sit quietly and never volunteer answers? A completely different set of solutions might be appropriate in this case, such as using cooperative learning or tutoring strategies to give students safe spaces in which to use the vocabulary they have, and gain motivation and opportunities to learn and use more.

Benchmark assessments fall into the enormous category of educational solutions that are simple, compelling, and wrong. Yes, teachers need to know what students are learning and what is needed to improve it, but they have available many more tools that are far more sensitive, useful, timely, and tied to actions teachers can take.

Eliminating benchmark assessments would save schools a lot of money. Perhaps that money could be redirected to professional development to help teachers use approaches actually proven to work. I know, that’s crazy talk. But perhaps if we looked at what students are actually doing and learning in class, we could stop weighing pigs and start improving teaching for all children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do Different Textbooks Have Different Effects on Student Achievement?

The British comedy group Monty Python used to refer to “privileged glimpses into the perfectly obvious.”

And just last week, there they were. In a front-page article, the March 13 edition of Education Week reported that a six-state study of the achievement outcomes of different textbooks found . . . wait for it. . . near-zero relative effects on achievement measures (Sawchuck, 2019).

Really!

The study was led by Harvard’s Thomas Kane, a major proponent of the Common Core, who was particularly upset to find out that textbooks produced before and after the Common Core influenced textbook content had few if any differential effects on achievement.

I doubt that I am the only person who is profoundly unsurprised by these findings. For the past 12 years, I’ve been doing reviews of research on programs’ effects on achievement in rigorous research. Textbooks (or curricula) are usually one of the categories in my reviews. You can see the reviews at www.bestevidence.org. Here is a summary of the average effect sizes for textbooks or curricula:

Review No. of Studies Mean Effect Size
Elementary Reading

(Inns et al., 2019)

9 +0.03
Elementary Math

Pellegrini et al., 2018)

16 +0.06
Secondary Math

(Slavin et al., 2009)

40 +0.03
Secondary Science

(Cheung et al., 2016)

8 +0.10
Weighted Average 73 +0.04

None of these outcomes suggest that textbooks make much difference, and the study-weighted average of +0.04 is downright depressing.

blog_3-28-19_sleepingstudent_500x333

Beyond the data, it is easy to see why evaluations of the achievement outcomes of textbooks rarely find significant positive outcomes. Such studies compare one textbook to another textbook that is usually rather similar. The reason is that textbook publishers respond to the demands of the market, not to evidence of effectiveness. New and existing textbooks were shaped by similar market forces. When standards change, as in the case of the Common Core State Standards in recent years, all textbook companies generally are forced to make changes in the same direction. There may be a brief window of time when new textbooks designed to meet new standards have a temporary advantage, but large publishers are extremely sensitive to such changes, and if they are not up to date in terms of standards today, they soon will be. Still, as the Kane et al. study found, changes in standards do not in themselves improve achievement on a substantial scale. Changes in standards do change market demand, which changes the content of textbooks, but fundamentally, the changes are not enough to make a measurable difference in learning.

Kane was quoted by Education Week as drawing the lesson from the study that perhaps it isn’t the textbooks that matter, but rather how the textbooks are used:

“What levels of coaching or more-intensive professional development are required to help teachers use rigorous materials at higher levels of fidelity, and does that produce larger benefits?” (Sawchuk, 2019, p. 17).

This sounds logical, but recent research in elementary mathematics calls this approach into question. Pellegrini et al. (2018) examined a category of programs that provide teachers with extensive professional development focused on math content and pedagogy. The average effect size across 12 studies was only +0.04, or essentially zero. In contrast, what did work very well were one-to-one and one-to-small group tutoring (mean effect size = +0.29) and professional development focused on classroom management and motivation (mean effect size = +0.25). In other words, programs focusing on helping teachers use standards-based materials added little if anything to the learning impact of textbooks. What mattered, beyond tutoring, were approaches that change classroom routines and relationships, such as cooperative learning or classroom management methods.

Changing textbooks matters little, and adding extensive professional development focused on standards adds even less. Instead, strategies that engage, excite, and accommodate individual needs of students are what we find to matter a great deal, across many subjects and grade levels.

This should be a privileged glimpse into the perfectly obvious. Everyone knows that textbooks make little difference. Walk through classrooms in any school, teaching any subject at any grade level. Some classes are exciting, noisy, fully engaged places in which students are eager to learn. Others are well, teaching the textbook. In which type of class did you learn best? In which type do you hope your own children will spend their time in school, or wish they had?

What is obvious from the experience of every teacher and everyone who has ever been a student is that changing textbooks and focusing on standards do not in themselves lead to classrooms that kindle the love of learning. Imagine that you, as an accomplished adult educator, took a class in tennis, or Italian, or underwater basket weaving. Would a teacher using better textbooks and more advanced standards make you love this activity and learn from it? Or would a teacher who expresses enthusiasm for the subject and for the students, who uses methods that engage students in active social activities in every lesson, obtain better outcomes of every kind? I hope this question answers itself.

I once saw a science teacher in Baltimore teaching anatomy by having students take apart steamed crabs (a major delicacy in Baltimore). The kids were working in groups, laughing at this absurd idea, but they were learning like crazy, and learning to love science. I would submit that this experience, these connections among students, this laughter are the standards our schools need to attain. It’s not about textbooks, nor professional development on textbooks.

Another Baltimore teacher I knew taught a terrific unit on ancient Egypt. The students made their own sarcophagi, taking into the afterlife the things most important to them. Then the class went on a field trip to a local museum with a mummy exhibit, and finally, students made sarcophagi representing what Egyptians would value in the afterlife.  That’s what effective teaching is about.

The great 18th century Swedish botanist Carl Linnaeus took his students on walks into forests, fields, and lakes around Uppsala University. Whatever they found, they brought back held high singing and playing conch shell trumpets in triumph.  That’s what effective teaching is about.

In England, I saw a teacher teaching graph coordinates. She gave each student’s desk a coordinate, from 1, 1 to 5, 5, and put up signs labeled North, South, East, and West on the walls. She then made herself into a robot, and the students gave her directions to get from one coordinate to another. The students were laughing, but learning. That’s what effective teaching is about.

No textbook can compete with these examples of inspired teaching. Try to remember your favorite textbook, or your least favorite. I can’t think of a single one. They were all the same. I love to read and love to learn, and I’m sure anyone reading this blog is the same. But textbooks? Did a textbook ever inspire you to want to learn more or give you enthusiasm for any subject?

This is a privileged glimpse into the perfectly obvious to which we should devote our efforts in innovation and professional development. A textbook or standard never ignited a student’s passion or curiosity. Textbooks and standards may be necessary, but they will not transform our schools. Let’s use what we already know about how learning really happens, and then make certain that every teacher knows how to do the things that make learning engage students’ hearts and emotions, not just their minds.

References

Cheung, A., Slavin, R.E., Kim, E., & Lake, C. (2016). Effective secondary science programs: A best-evidence synthesis. Journal of Research on Science Teaching, 54 (1), 58-81. Doi: 10.1002/tea.21338

Inns, A., Lake, C. Byun, S., Shi, C., & Slavin, R. E. (2019). Effective Tier 1 reading instruction for elementary schools: A systematic review. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, D.C.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. E. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Manuscript submitted for publication.

Sawchuk, S. (2019, March 13). New texts failed to lift test scores in six-state study. Education Week, 38(25), 1, 17.

Slavin, R.E., Lake, C., & Groff, C. (2009). Effective programs in middle and high school mathematics: A best-evidence synthesis. Review of Educational Research, 79 (2), 839-911.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

How Tutoring Could Benefit Students Who Do Not Need It

If you’ve been following my blogs, or if you know research on tutoring, you know that tutoring is hugely beneficial to the students who receive it. Recent research in both reading and math is finding important impacts of forms of tutoring that are much less expensive and scalable than the one-to-one tutoring by certified teachers that was once dominant. A review of research my colleagues and I did on effective programs for struggling readers found a mean effect size of +0.29 for one-to-small group tutoring provided by teaching assistants, across six studies of five programs involving grades K-5 (Inns, Lake, Pellegrini, & Slavin, 2018). Looking across the whole tutoring literature, in math as well as reading, positive outcomes of less expensive forms of tutoring are reliable and robust.

My focus today, however, is not on children who receive tutoring. It’s on all the other children. How does tutoring for the one third to one half of students in typical Title I schools who struggle in reading or math benefit the remaining students who were doing fine?

Imagine that Title I elementary schools had an average of three teaching assistants providing one-to-four tutoring in 7 daily sessions. This would enable them to serve 84 students each day, or perhaps 252 over the course of the year. Here is how this could benefit all children.

blog_1-31-19_tutorsnkids_500x333

Photo credit: Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

Eliminating within-class ability grouping.

Teachers justifiably complain about the difficulty of teaching highly diverse classes. Historically, they have dealt with diversity, especially in reading, by assigning students to top, middle, and low ability groups, so that they can provide appropriate levels of instruction for each group. Managing multiple ability groups is very difficult, because two-thirds of the class has to do seatwork (paper or digital) during follow-up time, while the teacher is working with another reading group. The seatwork cannot be challenging, because if it were, students would be asking questions, and the whole purpose of this seatwork is to keep students quiet so the teacher can teach a reading group. As a result, kids do what they do when they are bored and the teacher is occupied. It’s not pretty.

Sufficient high-quality one-to-four reading tutoring could add an effect size of at least +0.29 to the reading performance of every student in the low reading group. The goal would be to move the entire low group to virtual equality with the middle group. So some low achievers might need more and some less tutoring, and a few might need one-to-one tutoring rather than one-to-four. If the low and middle reading groups could be made similar in reading performance, teachers could dispense with within-class grouping entirely, and teach the whole class as one “reading group.” Eliminating seatwork, this would give every reading class three times as much valuable instructional time. This would be likely to benefit learning for students in the (former) middle and high groups directly (due to more high quality teaching), as well as taking a lot of stress off of the teacher, making the classroom more efficient and pleasant for all.

Improving behavior.

Ask any teacher who are the students who are most likely to act out in his or her class. It’s the low achievers. How could it be otherwise? Low achievers take daily blows to their self-esteem, and need to assert themselves in areas other than academics. One such “Plan B” for low achievers is misbehavior. If all students were succeeding in reading and math, improvements in behavior seem very likely. This would benefit all. I remember that my own very well-behaved daughter frequently came home from school very upset because other students misbehaved and got in trouble for it. Improved behavior due to greater success for low achievers would be beneficial to struggling readers themselves, but also to their classmates.

Improved outcomes in other subjects.

Most struggling students have problems in reading and math, and these are the only subjects in which tutoring is ever provided. Yet students who struggle in reading or math are likely to also have trouble in science, social studies, and other subjects, and these problems are likely to disrupt teaching and learning in those subjects as well. If all could succeed in reading and math, this would surely have an impact on other subjects, for non-struggling as well as struggling students.

Contributing to the teacher pipeline.

In the plan I’ve discussed previously, teaching assistants providing tutoring would mostly be ones with Bachelor’s degrees but not teaching certificates. These tutors would provide an ideal source of candidates for accelerated certification programs. Tutors who have apparent potential could be invited to enroll in such programs. The teachers developed in this way would be a benefit to all schools and all students in the district.  This aspect would be of particular value in inner city or rural areas that rely on teachers who grew up nearby and have roots in the area, as these districts usually have trouble attracting and maintaining outsiders.

Reducing special education and retention.

A likely outcome of successful tutoring would be to reduce retentions and special education placements. This would be of great benefit to the students not retained or not sent to special education, but also to the school as a whole, which would save a great deal of money.

Ultimately, I think every teacher, every student, and every parent would love to see every low reading group improve in performance enough to eliminate the need for reading groups. The process to get to this happy state of affairs is straightforward and likely to succeed wherever it is tried. Wouldn’t a whole school and a whole school system full of success be a great thing for all students, not just the low achievers?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.