Educational Policies vs. Educational Programs: Evidence from France

Ask any parent what their kids say when they ask them what they did in school today. Invariably, they respond, “Nuffin,” or some equivalent. My four-year-old granddaughter always says, “I played with my fwends.” All well and good.

However, in educational policy, policy makers often give the very same answer when asked, “What did the schools not using the (insert latest policy darling) do?”

“Nuffin’”. Or they say, “Whatever they usually do.” There’s nothing wrong with the latter answer if it’s true. But given the many programs now known to improve student achievement (see www.evidenceforessa.org), why don’t evaluators compare outcomes of new policy initiatives to those of proven educational programs known to improve the same outcomes the policy innovation is supposed to improve, perhaps at far lower cost per student? The evaluations should also compare to “business as usual,” but adding proven programs to evaluations of large policy innovations would help avoid declaring policy innovations to be successful when they are in fact just slightly more effective than “business as usual,” and much less effective or less cost-effective than alternative proven approaches? For example, when evaluating charter schools, why not routinely compare them to whole-school reform models that have similar objectives? When evaluating extending the school day or school year to help high-poverty schools, why not compare these innovations to using the same amount of additional money to hiring tutors to use proven tutoring models to help struggling students? In evaluating policies in which students are held back if they do not read at grade level by third grade, why not compare these approaches to intensive phonics instruction and tutoring in grades K-3, which are known to greatly improve student reading achievement?

blog_7-25-19_LeoandAdaya_375x500
There is nuffin like a good fwend.

As one example of research comparing a policy intervention to a promising educational intervention, I recently saw a very interesting pair of studies from France. Ecalle, Gomes, Auphan, Cros, & Magnan (2019) compared two interventions applied in special priority areas with high poverty levels. Both interventions focused on reading in first grade.

One of the interventions involved halving class size, from approximately 24 students to 12. The other provided intensive reading instruction in small groups (4-6 children) to students who were struggling in reading, as well as less intensive interventions to larger groups (10-12 students). Low achievers got two 30-minute interventions each day for a year, while the higher-performing readers got one 30-minute intervention each day. In both cases, the focus of instruction was on phonics. In all cases, the additional interventions were provided by the students’ usual teachers.

The students in small classes were compared to students in ordinary-sized classes, while the students in the educational intervention were compared to students in same-sized classes who did not get the group interventions. Similar measures and analyses were used in both comparisons.

The results were nearly identical for the class size policy and the educational intervention. Halving class size had effect sizes of +0.14 for word reading and +0.22 for spelling. Results for the educational intervention were +0.13 for word reading, +0.12 for spelling, +0.14 for a group test of reading comprehension, +0.32 for an individual test of comprehension, and +0.19 for fluency.

These studies are less than perfect in experimental design, but they are nevertheless interesting. Most importantly, the class size policy required an additional teacher for each class of 24. Using Maryland annual teacher salaries and benefits ($84,000), that means the cost in our state would be about $3500 per student. The educational intervention required one day of training and some materials. There was virtually no difference in outcomes, but the differences in cost were staggering.

The class size policy was mandated by the Ministry of Education. The educational intervention was offered to schools and provided by a university and a non-profit. As is so often the case, the policy intervention was simplistic, easy to describe in the newspaper, and minimally effective. The class size policy reminds me of a Florida program that extended the school schedule by an hour every day in high-poverty schools, mainly to provide more time for reading instruction. The cost per child was about $800 per year. The outcomes were minimal (ES=+0.05).

After many years of watching what schools do and reviewing research on outcomes of innovations, I find it depressing that policies mandated on a substantial scale are so often found to be ineffective. They are usually far more expensive than much more effective, rigorously evaluated programs that are, however, a bit more difficult to describe, and rarely arouse great debate in the political arena. It’s not that anyone is opposed to the educational intervention, but it is a lot easier to carry a placard saying “Reduce Class Size Now!” than to carry one saying “Provide Intensive Phonics in Small Groups with More Supplemental Teaching for the Lowest Achievers Now!” The latter just does not fit on a placard, and though easy to understand if explained, it does not lend itself to easy communication. Actually, there are much more effective first grade interventions than the one evaluated in France (see www.evidenceforessa.org). At a cost much less than $3500 per student, several one-to-one tutoring programs using well-trained teaching assistants as tutors would have been able to produce an effect size of more than +0.50 for all first graders on average. This would even fit on a placard: “Tutoring Now!”

I am all in favor of trying out policy innovations. But when parents of kids in a proven-program comparison group are asked what they did in school today, they shouldn’t say “nuffin’”. They should say, “My tooter taught me to read. And I played with my fwends.”

References

Ecalle, J., Gomes, C., Auphan, P., Cros, L., & Magnan, A. (2019). Effects of policy and educational interventions intended to reduce difficulties in literacy skills in grade 1. Studies in Educational Evaluation, 61, 12-20.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Is ES=+0.50 Achievable?: Schoolwide Approaches That Might Meet This Standard

In a recent blog, “Make No Small Plans,” I proposed a system innovators could use to create very effective schoolwide programs.  I defined these as programs capable of making a difference in student achievement large enough to bring entire schools serving disadvantaged students to the levels typical of middle class schools.  On average, that would mean creating school models that could routinely add an effect size of +0.50 for entire disadvantaged schools.  +0.50, or half a standard deviation, is roughly the average difference between students who qualify for free lunch and those who do not, between African American and White students, and between Hispanic and non-Hispanic White students.

Today, I wanted to give some examples of approaches intended to meet the +0.50 goal. From prior work, my colleagues and I already have created a successful schoolwide reform model, Success for All, which, with adequate numbers of tutors (as many as six per school) achieved reading effect sizes in high-poverty Baltimore elementary schools of over +0.50 for all students and +0.75 for the lowest-achieving quarter of students (Madden et al, 1993).   These outcomes maintained through eighth grade, and showed substantial reductions in grade retentions and special education placements (Borman & Hewes, 2003).  Steubenville, in Ohio’s Rust Belt, uses Success for All in all of its Title I elementary schools, providing several tutors in each.  Each year, Steubenville schools score among the highest in Ohio on state tests, exceeding most wealthy suburban schools.  Other SFA schools with sufficient tutors are also exemplary in achievement gains.  Yet these schools face a dilemma.  Most cannot afford significant numbers of tutors.  They still get excellent results, but less than those typical of SFA schools that do have sufficient tutors.

blog_12-20-18_tutornkid_500x333

We are now planning another approach, also intended to produce schoolwide effect sizes of at least +0.50 in schools serving disadvantaged students.   However, in this case our emphasis is on tutoring, the most effective strategy known for improving the achievement of struggling readers (Inns et al., 2019).  We are calling this approach the Reading Safety Net.  Main components of this plan are as follows:

Tutoring

Like the most successful forms of Success for All, the Reading Safety Net places a substantial emphasis on tutoring.  Tutors will be well-qualified teaching assistants with BAs but not teaching certificates, extensively trained to provide one-to-four tutoring.   Tutors will use a proven computer-assisted model in which students do a lot of pair teaching.  This is what we now call our Tutoring With the Lightning Squad model, which achieved outcomes of +0.40 and +0.46 in two studies in the Baltimore City Public Schools (Madden & Slavin, 2017).  A high-poverty school of 500 students might engage about five tutors, providing extensive tutoring to the majority of students, for as many years as necessary.  One additional tutor or teacher will supervise the tutors and personally work with students having the most serious problems.   We will provide significant training and follow-up coaching to ensure that all tutors are effective.

blog_11-8-18_tutoring_500x333

Attendance and Health

Many students fail in reading or other outcomes because they have attendance problems or certain common health problems. We propose to provide a health aide to help solve these problems.

Attendance

Many students, especially those in high-poverty schools, fail because they do not attend school regularly. Yet there are several proven approaches for increasing attendance, and reducing chronic truancy (Shi, Inns, Lake, and Slavin, 2019).  Health aides will help teachers and other staff organize and manage effective attendance improvement approaches.

Vision Services

My colleagues and I have designed strategies to help ensure that all students who need eyeglasses receive them. A key problem in this work is ensuring that students who receive glasses use them, keep them safe, and replace them if they are lost or broken. Health aides will coordinate use of proven strategies to increase regular use of needed eyeglasses.

blog_4-19-18_tutoring_500x329

Asthma and other health problems

Many students in high-poverty schools suffer from chronic illnesses.  Cures or prevention are known for these, but the cures may not work if medications are not taken daily.   For example, asthma is common in high-poverty schools, where it is the top cause of hospital referrals and a leading cause of death for school-age children.  Inexpensive inhalers can substantially improve children’s health, yet many children do not regularly take their medicine. Studies suggest that having trained staff ensure that students take their medicine, and watch them doing so, can make a meaningful difference.  The same may be true of other chronic, easily treated diseases common among children but often not consistently treated in inner-city schools.  Health aides with special supplemental training may be able to play a key on-the-ground role in helping ensure effective treatment for asthma and other diseases.

Potential Impact

The Reading Safety Net is only a concept at present.  We are seeking funding to support its further development and evaluation.  As we work with front line educators, colleagues, and others to further develop this model, we are sure to find ways to make the approach more effective and cost-effective, and perhaps extend it to solve other key problems.

We cannot yet claim that the Reading Safety Net has been proven effective, although many of its components have been.  But we intend to do a series of pilots and component evaluations to progressively increase the impact, until that impact attains or surpasses the goal of ES=+0.50.  We hope that many other research teams will mobilize and obtain resources to find their own ways to +0.50.  A wide variety of approaches, each of which would be proven to meet this ambitious goal, would provide a range of effective choices for educational leaders and policy makers.  Each would be a powerful, replicable tool, capable of solving the core problems of education.

We know that with sufficient investment and encouragement from funders, this goal is attainable.  If it is in fact attainable, how could we accept anything less?

References

Borman, G., & Hewes, G. (2003).  Long-term effects and cost effectiveness of Success for All.  Educational Evaluation and Policy Analysis, 24 (2), 243-266.

Inns, A., Lake, C., Pellegrini, M., & Slavin, R. (2019). A synthesis of quantitative research on programs for struggling readers in elementary schools. Manuscript submitted for publication.

Madden, N. A., & Slavin, R. E. (2017). Evaluations of Technology-Assisted Small-Group Tutoring for Struggling Readers. Reading & Writing Quarterly, 1-8.

Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L., & Wasik, B. (1993). Success for All:  Longitudinal effects of a schoolwide elementary restructuring program. American Educational Reseach Journal, 30, 123-148.

Shi, C., Inns, A., Lake, C., & Slavin, R. E. (2019). Effective school-based programs for K-12 students’ attendance: A best-evidence synthesis. Baltimore, MD: Center for Research and Reform in Education, Johns Hopkins University.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Effect Sizes and Additional Months of Gain: Can’t We Just Agree That More is Better?

In the 1984 mockumentary This is Spinal Tap, there is a running joke about a hapless band, Spinal Tap, which proudly bills itself “Britain’s Loudest Band.”  A pesky reporter keeps asking the band’s leader, “But how can you prove that you are Britain’s loudest band?” The band leader explains, with declining patience, that while ordinary amplifiers’ sound controls only go up to 10, Spinal Tap’s go up to 11.  “But those numbers are arbitrary,” says the reporter.  “They don’t mean a thing!”  “Don’t you get it?” asks the band leader.  “ELEVEN is more than TEN!  Anyone can see that!”

In educational research, we have an ongoing debate reminiscent of Spinal Tap.  Educational researchers speaking to other researchers invariably express the impact of educational treatments as effect sizes (the difference in adjusted means for the experimental and control groups divided by the unadjusted standard deviation).  All else being equal, higher effect sizes are better than lower ones.

However, educators who are not trained in statistics often despise effect sizes.  “What do they mean?” they ask.  “Tell us how much difference the treatment makes in student learning!”

Researchers want to be understood, so they try to translate effect sizes into more educator-friendly equivalents.  The problem is that the friendlier the units, the more statistically problematic they are.  The friendliest of all is “additional months of learning.”  Researchers or educators can look on a chart and, for any particular effect size, they can find the number of “additional months of learning.”  The Education Endowment Foundation in England, which funds and reports on rigorous experiments, reports both effect sizes and additional months of learning, and provides tables to help people make the conversion.  But here’s the rub.  A recent article by Baird & Pane (2019) compared additional months of learning to three other translations of effect sizes.  Additional months of learning was rated highest in ease of use, but lowest in four other categories, such as transparency and consistency. For example, a month of learning clearly has a different meaning in kindergarten than it does in tenth grade.

The other translations rated higher by Baird and Pane were, at least to me, just as hard to understand as effect sizes.  For example, the What Works Clearinghouse presents, along with effect sizes, an “improvement index” that has the virtue of being equally incomprehensible to researchers and educators alike.

On one hand, arguing about outcome metrics is as silly as arguing the relative virtues of Fahrenheit and Celsius. If they can be directly transformed into the other unit, who cares?

However, additional months of learning is often used to cover up very low effect sizes. I recently ran into an example of this in a series of studies by the Stanford Center for Research on Education Outcomes (CREDO), in which disadvantaged urban African American students gained 59 more “days of learning” than matched students not in charters in math, and 44 more days in reading. These numbers were cited in an editorial praising charter schools in the May 29 Washington Post.

However, these “days of learning” are misleading. The effect size for this same comparison was only +0.08 for math, and +0.06 for reading. Any researcher will tell you that these are very small effects. They were only made to look big by reporting the gains in days. These not only magnify the apparent differences, but they also make them unstable. Would it interest you to know that White students in urban charter schools performed 36 days a year worse than matched students in math (ES= -0.05) and 14 days worse in reading (ES= -0.02)? How about Native American students in urban charter schools, whose scores were 70 days worse than matched students in non-charters in math (ES= -0.10), and equal in reading. I wrote about charter school studies in a recent blog. In the blog, I did not argue that charter schools are effective for disadvantaged African Americans but harmful for Whites and Native Americans. That seems unlikely. What I did argue is that the effects of charter schools are so small that the directions of the effects are unstable. The overall effects across all urban schools studied were only 40 days (ES=+0.055) in math and 28 days (ES=+0.04) in reading. These effects look big because of the “days of learning” transformation, but they are not.

blog_6-13-19_volume_500x375In This is Spinal Tap, the argument about whether or not Spinal Tap is Britain’s loudest band is absurd.  Any band can turn its amplifiers to the top and blow out everyone’s eardrums, whether the top is marked eleven or ten.  In education, however, it does matter a great deal that educators are taking evidence into account in their decisions about educational programs. Using effect sizes, perhaps supplemented by additional months of learning, is one way to help readers understand outcomes of educational experiments. Using “days of learning,” however, is misleading, making very small impacts look important. Why not additional hours or minutes of learning, while we’re at it? Spinal Tap would be proud.

References

Baird, M., & Paine, J. (2019). Translating standardized effects of education programs into more interpretable metrics. Educational Researcher. Advance online publication. doi.org/10.3102/0013189X19848729

CREDO (2015). Overview of the Urban Charter School Study. Stanford, CA: Author.

Washington Post: Denying poor children a chance. [Editorial]. (May 29, 2019). The Washington Post, A16.

 

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Charter Schools? Smarter Schools? Why Not Both?

I recently saw an editorial in the May 29 Washington Post, entitled “Denying Poor Children a Chance,” a pro-charter school opinion piece that makes dire predictions about the damage to poor and minority students that would follow if charter expansion were to be limited.  In education, it is common to see evidence-free opinions for and against charter schools, so I was glad to see actual data in the Post editorial.   In my view, if charter schools could routinely and substantially improve student outcomes, especially for disadvantaged students, I’d be a big fan.  My response to charter schools is the same as my response to everything else in education: Show me the evidence.

The Washington Post editorial cited a widely known 2015 Stanford CREDO study comparing urban charter schools to matched traditional public schools (TPS) in the same districts.  Evidence always attracts my attention, so I decided to look into this and other large, multi-district studies. Despite the Post’s enthusiasm for the data, the average effect size was only +0.055 for math and +0.04 for reading.  By anyone’s standards, these are very, very small outcomes.  Outcomes for poor, urban, African American students were somewhat higher, at +0.08 for math and +0.06 for reading, but on the other hand, average effect sizes for White students were negative, averaging -0.05 for math and -0.02 for reading.  Outcomes were also negative for Native American students: -0.10 for math, zero for reading.  With effect sizes so low, these small differences are probably just different flavors of zero.  A CREDO (2013) study of charter schools in 27 states, including non-urban as well as urban schools, found average effect sizes of +0.01 for math and -0.01 for reading. How much smaller can you get?

In fact, the CREDO studies have been widely criticized for using techniques that inflate test scores in charter schools.  They compare students in charter schools to students in traditional public schools, matching on pretests and ethnicity.  This ignores the obvious fact that students in charter schools chose to go there, or their parents chose for them to go.  There is every reason to believe that students who choose to attend charter schools are, on average, higher-achieving, more highly motivated, and better behaved than students who stay in traditional public schools.  Gleason et al. (2010) found that students who applied to charter schools started off 16 percentage points higher in reading and 13 percentage points higher in math than others in the same schools who did not apply.  Applicants were more likely to be White and less likely to be African American or Hispanic, and they were less likely to qualify for free lunch.  Self-selection is a particular problem in studies of students who choose or are sent to “no-excuses” charters, such as KIPP or Success Academies, because the students or their parents know students will be held to very high standards of behavior and accomplishment, and may be encouraged to leave the school if they do not meet those standards (this is not a criticism of KIPP or Success Academies, but when such charter systems use lotteries to select students, the students who show up for the lotteries were at least motivated to participate in a lottery to attend a very demanding school).

Well-designed studies of charter schools usually focus on schools that use lotteries to select students, and then they compare the students who were successful in the lottery to those who were not so lucky.  This eliminates the self-selection problem, as students were selected by a random process.  The CREDO studies do not do this, and this may be why their studies report higher (though still very small) effect sizes than those reported by syntheses of studies of students who all applied to charters, but may have been “lotteried in” or “lotteried out” at random.  A very rigorous WWC synthesis of such studies by Gleason et al. (2010) found that middle school students who were lotteried into charter schools in 32 states performed non-significantly worse than those lotteried out, in math (ES=-0.06) and in reading (ES=-0.08).  A 2015 update of the WWC study found very similar, slightly negative outcomes in reading and math.

It is important to note that “no-excuses” charter schools, mentioned earlier, have had more positive outcomes than other charters.  A recent review of lottery studies by Cheng et al. (2017) found effect sizes of +0.25 for math and +0.17 for reading.  However, such “no-excuses” charters are a tiny percentage of all charters nationwide.

blog_6-5-19_schoolmortorbd_500x422

Other meta-analyses of studies of achievement outcomes of charter schools also exist, but none found effect sizes as high as the CREDO urban study.  The means of +0.055 for math and +0.04 for reading represent upper bounds for effects of urban charter schools.

Charter Schools or Smarter Schools?

So far, every study of achievement effects of charters has focused on impacts of charters on achievement compared to those of traditional public schools.  However, this should not be the only question.  “Charters” and “non-charters” do not exhaust the range of possibilities.

What if we instead ask this question: Among the range of programs available, which are most likely to be most effective at scale?

To illustrate the importance of this question, consider a study in England, which evaluated a program called Engaging Parents Through Mobile Phones.  The program involves texting parents on cell phones to alert them to upcoming tests, inform them about whether students are completing their homework, and tell them what students were being taught in school.  A randomized evaluation (Miller et al, 2017) found effect sizes of +0.06 for math and +0.03 for reading, remarkably similar to the urban charter school effects reported by CREDO (2015).  The cost of the mobile phone program was £6 per student per year, or $7.80.  If you like the outcomes of charter schools, might you prefer to get the same outcomes for $7.80 per child per year, without all the political, legal, and financial stresses of charter schools?

The point here is that rather than arguing about the size of small charter effects, one could consider charters a “treatment” and compare them to other proven approaches.  In our Evidence for ESSA website, we list 112 reading and math programs that meet ESSA standards for “Strong,” “Moderate,” or “Promising” evidence of effectiveness.  Of these, 107 had effect sizes larger than those CREDO (2015) reports for urban charter schools.  In both math and reading, there are many programs with average effect sizes of +0.20, +0.30, up to more than +0.60.  If applied as they were in the research, the best of these programs could, for example, entirely overcome Black-White and Hispanic-White achievement gaps in one or two years.

A few charter school networks have their own proven educational approaches, but the many charters that do not have proven programs should be looking for them.  Most proven programs work just as well in charter schools as they do in traditional public schools, so there is no reason existing charter schools should not proactively seek proven programs to increase their outcomes.  For new charters, wouldn’t it make sense for chartering agencies to encourage charter applicants to systematically search for and propose to adopt programs that have strong evidence of effectiveness?  Many charter schools already use proven programs.  In fact, there are several that specifically became charters to enable them to adopt or maintain our Success for All whole-school reform program.

There is no reason for any conflict between charter schools and smarter schools.  The goal of every school, regardless of its governance, should be to help students achieve their full potential, and every leader of a charter or non-charter school would agree with this. Whatever we think about governance, all schools, traditional or charter, should get smarter, using proven programs of all sorts to improve student outcomes.

References

Cheng, A., Hitt, C., Kisida, B., & Mills, J. N. (2017). “No excuses” charter schools: A meta-analysis of the experimental evidence on student achievement. Journal of School Choice, 11 (2), 209-238.

Clark, M.A., Gleason, P. M., Tuttle, C. C., & Silverberg, M. K., (2015). Do charter schools improve student achievement? Educational Evaluation and Policy Analysis, 37 (4), 419-436.

Gleason, P.M., Clark, M. A., Tuttle, C. C., & Dwoyer, E. (2010).The evaluation of charter school impacts. Washington, DC: What Works Clearinghouse.

Miller, S., Davison, J, Yohanis, J., Sloan, S., Gildea, A., & Thurston, A. (2016). Texting parents: Evaluation report and executive summary. London: Education Endowment Foundation.

Washington Post: Denying poor children a chance. [Editorial]. (May 29, 2019). The Washington Post, A16.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Can Computers Teach?

Something’s coming

I don’t know

What it is

But it is

Gonna be great!

-Something’s Coming, West Side Story

For more than 40 years, educational technology has been on the verge of transforming educational outcomes for the better. The song “Something’s Coming,” from West Side Story, captures the feeling. We don’t know how technology is going to solve our problems, but it’s gonna be great!

Technology Counts is an occasional section of Education Week. Usually, it publishes enthusiastic predictions about the wonders around the corner, in line with its many advertisements for technology products of all kinds. So it was a bit of a shock to see the most recent edition, dated April 24. An article entitled, “U.S. Teachers Not Seeing Tech Impact,” by Benjamin Herold, reported a nationally representative survey of 700 teachers. They reported huge purchases of digital devices, software, learning apps, and other technology in the past three years. That’s not news, if you’ve been in schools lately. But if you think technology is doing “a lot” to support classroom innovation, you’re out of step with most of the profession. Only 29% of teachers would agree with you, but 41% say “some,” 26% “a little,” and 4% “none.” Equally modest proportions say that technology has “changed their work as a teacher.” The Technology Counts articles describe most teachers as using technology to help them do what they have always done, rather than to innovate.

There are lots of useful things technology is used for, such as teaching students to use computers, and technology may make some tasks easier for teachers and students. But from their earliest beginnings, everyone hoped that computers would help students learn traditional subjects, such as reading and math. Do they?

blog_5-16-19_kidscomputers_500x333

The answer is, not so much. The table below shows average effect sizes for technology programs in reading and math, using data from four recent rigorous reviews of research. Three of these have been posted at www.bestevidence.org. The fourth, on reading strategies for all students, will be posted in the next few weeks.

Mean Effect Sizes for Applications of Technology in Reading and Mathematics
Number of Studies Mean Effect Size
Elementary Reading 16 +0.09
Elementary Reading – Struggling Readers 6 +0.05
Secondary Reading 23 +0.08
Elementary Mathematics 14 +0.07
Study-Weighted Mean 59 +0.08

An effect size of +0.08, which is the average across the four reviews, is not zero. But it is not much. It is certainly not revolutionary. Also, the effects of technology are not improving over time.

As a point of comparison, average effect sizes for tutoring by teaching assistants have the following effect sizes:

Number of Studies Mean Effect Size
Elementary Reading – Struggling Readers 7 +0.34
Secondary Reading 2 +0.23
Elementary Mathematics 10 +0.27
Study-Weighted Mean 19 +0.29

Tutoring by teaching assistants is more than 3 ½ times as effective as technology. Yet the cost differences between tutoring and technology, especially for effective one-to-small group tutoring by teaching assistants, is not much.

Tutoring is not the only effective alternative to technology. Our reviews have identified many types of programs that are more effective than technology.

A valid argument for continuing with use of technology is that eventually, we are bound to come up with more effective technology strategies. It is certainly worthwhile to keep experimenting. But this argument has been made since the early 1970s, and technology is still not ready for prime time, as least as far as teaching reading and math are concerned. I still believe that technology’s day will come, when strategies to get the best from both teachers and technology will reliably be able to improve learning. Until then, let’s use programs and practices already proven to be effective, as we continue to work to improve the outcomes of technology.

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Could Proven Programs Eliminate Gaps in Elementary Reading Achievement?

What if every child in America could read at grade level or better? What if the number of students in special education for learning disabilities, or retained in grade, could be cut in half?

What if students who become behavior problems or give up on learning because of nothing more than reading difficulties could instead succeed in reading and no longer be frustrated by failure?

Today these kinds of outcomes are only pipe dreams. Despite decades of effort and billions of dollars directed toward remedial and special education, reading levels have barely increased.  Gaps between middle class and economically disadvantaged students remain wide, as do gaps between ethnic groups. We’ve done so much, you might think, and nothing has really worked at scale.

Yet today we have many solutions to the problems of struggling readers, solutions so effective that if widely and effectively implemented, they could substantially change not only the reading skills, but the life chances of students who are struggling in reading.

blog_4-25-19_teacherreading_500x333

How do I know this is possible? The answer is that the evidence is there for all to see.

This week, my colleagues and I released a review of research on programs for struggling readers. The review, written by Amanda Inns, Cynthia Lake, Marta Pellegrini, and myself, uses academic language and rigorous review methods. But you don’t have to be a research expert to understand what we found out. In ten minutes, just reading this blog, you will know what needs to be done to have a powerful impact on struggling readers.

Everyone knows that there are substantial gaps in student reading performance according to social class and race. According to the National Assessment of Educational Progress, or NAEP, here are key gaps in terms of effect sizes at fourth grade:

Gap in Effect Sizes
No Free/Reduced lunch/

Free/Reduced lunch

0.56
White/African American 0.52
White/Hispanic 0.46

These are big differences. In order to eliminate these gaps, we’d have to provide schools serving disadvantaged and minority students with programs or services sufficient to increase their reading scores by about a half standard deviation. Is this really possible?

Can We Really Eliminate Such Big and Longstanding Gaps?

Yes, we can. And we can do it cost-effectively.

Our review examined thousands of studies of programs intended to improve the reading performance of struggling readers. We found 59 studies of 39 different programs that met very high standards of research quality. 73% of the qualifying studies used random assignment to experimental or control groups, just as the most rigorous medical studies do. We organized the programs into response to intervention (RTI) tiers:

Tier 1 means whole-class programs, not just for struggling readers

Tier 2 means targeted services for students who are struggling to read

Tier 3 means intensive services for students who have serious difficulties.

Our categories were as follows:

Multi-Tier (Tier 1 + tutoring for students who need it)

Tier 1:

  • Whole-class programs

Tier 2:

  • Technology programs
  • One-to-small group tutoring

Tier 3:

  • One-to-one tutoring

We are not advocating for RTI itself, because the data on RTI are unclear. But it is just common sense to use proven programs with all students, then proven remedial approaches with struggling readers, then intensive services for students for whom Tier 2 is not sufficient.

Do We Have Proven Programs Able to Overcome the Gaps?

The table below shows average effect sizes for specific reading approaches. Wherever you see effect sizes that approach or exceed +0.50, you are looking at proven solutions to the gaps, or at least programs that could become a component in a schoolwide plan to ensure the success of all struggling readers.

Programs That Work for Struggling Elementary Readers

Multi-Tier Approaches Grades Proven No. of Studies Mean Effect Size
      Success for All K-5 3 +0.35
      Enhanced Core Reading Instruction 1 1 +0.24
Tier 1 – Classroom Approaches      
     Cooperative Integrated Reading                        & Composition (CIRC) 2-6 3 +0.11
      PALS 1 1 +0.65
Tier 2 – One-to-Small Group Tutoring      
      Read, Write, & Type (T 1-3) 1 1 +0.42
      Lindamood (T 1-3) 1 1 +0.65
      SHIP (T 1-3) K-3 1 +0.39
      Passport to Literacy (TA 1-4/7) 4 4 +0.15
      Quick Reads (TA 1-2) 2-3 2 +0.22
Tier 3 One-to-One Tutoring
      Reading Recovery (T) 1 3 +0.47
      Targeted Reading Intervention (T) K-1 2 +0.50
      Early Steps (T) 1 1 +0.86
      Lindamood (T) K-2 1 +0.69
      Reading Rescue (T or TA) 1 1 +0.40
      Sound Partners (TA) K-1 2 +0.43
      SMART (PV) K-1 1 +0.40
      SPARK (PV) K-2 1 +0.51

Key:    T: Certified teacher tutors

TA: Teaching assistant tutors

PV: Paid volunteers (e.g., AmeriCorps members)

1-X: For small group tutoring, the usual group size for tutoring (e.g., 1-2, 1-4)

(For more information on each program, see www.evidenceforessa.org)

The table is a road map to eliminating the achievement gaps that our schools have wrestled with for so long. It only lists programs that succeeded at a high level, relative to others at the same tier levels. See the full report or www.evidenceforessa for information on all programs.

It is important to note that there is little evidence of the effectiveness of tutoring in grades 3-5. Almost all of the evidence is from grades K-2. However, studies done in England in secondary schools have found positive effects of three reading tutoring programs in the English equivalent of U.S. grades 6-7. These findings suggest that when well-designed tutoring programs for grades 3-5 are evaluated, they will also show very positive impacts. See our review on secondary reading programs at www.bestevidence.org for information on these English middle school tutoring studies. On the same website, you can also see a review of research on elementary mathematics programs, which reports that most of the successful studies of tutoring in math took place in grades 2-5, another indicator that reading tutoring is also likely to be effective in these grades.

Some of the individual programs have shown effects large enough to overcome gaps all by themselves if they are well implemented (i.e., ES = +0.50 or more). Others have effect sizes lower than +0.50 but if combined with other programs elsewhere on the list, or if used over longer time periods, are likely to eliminate gaps. For example, one-to-one tutoring by certified teachers is very effective, but very expensive. A school might implement a Tier 1 or multi-tier approach to solve all the easy problems inexpensively, then use cost-effective one-to-small group methods for students with moderate reading problems, and only then use one-to-one tutoring with the small number of students with the greatest needs.

Schools, districts, and states should consider the availability, practicality, and cost of these solutions to arrive at a workable solution. They then need to make sure that the programs are implemented well enough and long enough to obtain the outcomes seen in the research, or to improve on them.

But the inescapable conclusion from our review is that the gaps can be closed, using proven models that already exist. That’s big news, news that demands big changes.

Photo credit: Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Benchmark Assessments: Weighing the Pig More Often?

There is an old saying about educational assessment: “If you want to fatten a pig, it doesn’t help to weigh it more often.”

To be fair, it may actually help to weigh pigs more often, so the farmer knows whether they are gaining weight at the expected levels. Then they can do something in time if this is not the case.

It is surely correct that weighing pigs does no good in itself, but it may serve a diagnostic purpose. What matters is not the weighing, but rather what the farmer or veterinarian does based on the information provided by the weighing.

blog_4-11-19_pigscale_500x432

This blog is not, however, about porcine policy, but educational policy. In schools, districts, and even whole states, most American children take “benchmark assessments” roughly three to six times a year. These assessments are intended to tell teachers, principals, and other school leaders how students are doing, especially in reading and math. Ideally, benchmark assessments are closely aligned with state accountability tests, making it possible for school leaders to predict how whole grade levels are likely to do on the state tests early enough in the year to enable them to provide additional assistance in areas of need. The information might be as detailed as “fourth graders need help in fractions” or “English learners need help in vocabulary.”

Benchmark assessments are only useful if they improve scores on state accountability tests. Other types of intervention may be beneficial even if they do not make any difference in state test scores, but it is hard to see why benchmark assessments would be valuable if they do not in fact have any impact on state tests, or other standardized tests.

So here is the bad news: Research finds that benchmark assessments do not make any difference in achievement.

High-quality, large scale randomized evaluations of benchmark assessments are relatively easy to do. Many have in fact been done. Use of benchmark assessments have been evaluated in elementary reading and math (see www.bestevidence.org). Here is a summary of the findings.

Number of Studies Mean Effect Size
Elementary Reading 6 -0.02
Elementary Math 4    .00
Study-weighted mean 10 -0.01

In a rational world, these findings would put an end to benchmark assessments, at least as they are used now. The average outcomes are not just small, they are zero. They use up a lot of student time and district money.

In our accountability-obsessed educational culture, how could use of benchmark assessments make no difference at all on the only measure they are intended to improve? I would suggest several possibilities.

First, perhaps the most likely, is that teachers and schools do not do much with the information from benchmark assessments. If you are trying to lose weight, you likely weigh yourself every day. But if you then make no systematic effort to change your diet or increase your exercise, then all those weighings are of little value. In education, the situation is much worse than in weight reduction, because teachers are each responsible for 20-30 students. Results of benchmark assessments are different for each student, so a school staff that learns that its fourth graders need improvement in fractions finds it difficult to act on this information. Some fourth graders in every school are excelling in fractions, some just need a little help, and some are struggling in fractions because they missed the prerequisite skills. “Teach more fractions” is not a likely solution except for some of that middle group, yet differentiating instruction for all students is difficult to do well.

Another problem is that it takes time to score and return benchmark assessments, so by the time a team of teachers decides how to respond to benchmark information, the situation has moved on.

Third, benchmark assessments may add little because teachers and principals already know a lot more about their students than any test can tell them. Imagine a principal receiving the information that her English learners need help in vocabulary. I’m going to guess that she already knows that. But more than that, she and her teachers know which English learners need what kind of vocabulary, and they have other measures and means of finding out. Teachers already give a lot of brief, targeted curriculum-linked assessments, and they always have. Further, wise teachers stroll around and listen in on students working in cooperative groups, or look at their tests or seatwork or progress on computer curriculum, to get a sophisticated understanding of why some students are having trouble, and ideas for what to do about it. For example, it is possible that English learners are lacking school-specific vocabulary, such as that related to science or social studies, and this observation may suggest solutions (e.g., teach more science and social studies). But what if some English learners are afraid or unwilling to express themselves in class, but sit quietly and never volunteer answers? A completely different set of solutions might be appropriate in this case, such as using cooperative learning or tutoring strategies to give students safe spaces in which to use the vocabulary they have, and gain motivation and opportunities to learn and use more.

Benchmark assessments fall into the enormous category of educational solutions that are simple, compelling, and wrong. Yes, teachers need to know what students are learning and what is needed to improve it, but they have available many more tools that are far more sensitive, useful, timely, and tied to actions teachers can take.

Eliminating benchmark assessments would save schools a lot of money. Perhaps that money could be redirected to professional development to help teachers use approaches actually proven to work. I know, that’s crazy talk. But perhaps if we looked at what students are actually doing and learning in class, we could stop weighing pigs and start improving teaching for all children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.