Could Proven Programs Eliminate Gaps in Elementary Reading Achievement?

What if every child in America could read at grade level or better? What if the number of students in special education for learning disabilities, or retained in grade, could be cut in half?

What if students who become behavior problems or give up on learning because of nothing more than reading difficulties could instead succeed in reading and no longer be frustrated by failure?

Today these kinds of outcomes are only pipe dreams. Despite decades of effort and billions of dollars directed toward remedial and special education, reading levels have barely increased.  Gaps between middle class and economically disadvantaged students remain wide, as do gaps between ethnic groups. We’ve done so much, you might think, and nothing has really worked at scale.

Yet today we have many solutions to the problems of struggling readers, solutions so effective that if widely and effectively implemented, they could substantially change not only the reading skills, but the life chances of students who are struggling in reading.

blog_4-25-19_teacherreading_500x333

How do I know this is possible? The answer is that the evidence is there for all to see.

This week, my colleagues and I released a review of research on programs for struggling readers. The review, written by Amanda Inns, Cynthia Lake, Marta Pellegrini, and myself, uses academic language and rigorous review methods. But you don’t have to be a research expert to understand what we found out. In ten minutes, just reading this blog, you will know what needs to be done to have a powerful impact on struggling readers.

Everyone knows that there are substantial gaps in student reading performance according to social class and race. According to the National Assessment of Educational Progress, or NAEP, here are key gaps in terms of effect sizes at fourth grade:

Gap in Effect Sizes
No Free/Reduced lunch/

Free/Reduced lunch

0.56
White/African American 0.52
White/Hispanic 0.46

These are big differences. In order to eliminate these gaps, we’d have to provide schools serving disadvantaged and minority students with programs or services sufficient to increase their reading scores by about a half standard deviation. Is this really possible?

Can We Really Eliminate Such Big and Longstanding Gaps?

Yes, we can. And we can do it cost-effectively.

Our review examined thousands of studies of programs intended to improve the reading performance of struggling readers. We found 59 studies of 39 different programs that met very high standards of research quality. 73% of the qualifying studies used random assignment to experimental or control groups, just as the most rigorous medical studies do. We organized the programs into response to intervention (RTI) tiers:

Tier 1 means whole-class programs, not just for struggling readers

Tier 2 means targeted services for students who are struggling to read

Tier 3 means intensive services for students who have serious difficulties.

Our categories were as follows:

Multi-Tier (Tier 1 + tutoring for students who need it)

Tier 1:

  • Whole-class programs

Tier 2:

  • Technology programs
  • One-to-small group tutoring

Tier 3:

  • One-to-one tutoring

We are not advocating for RTI itself, because the data on RTI are unclear. But it is just common sense to use proven programs with all students, then proven remedial approaches with struggling readers, then intensive services for students for whom Tier 2 is not sufficient.

Do We Have Proven Programs Able to Overcome the Gaps?

The table below shows average effect sizes for specific reading approaches. Wherever you see effect sizes that approach or exceed +0.50, you are looking at proven solutions to the gaps, or at least programs that could become a component in a schoolwide plan to ensure the success of all struggling readers.

Programs That Work for Struggling Elementary Readers

Multi-Tier Approaches Grades Proven No. of Studies Mean Effect Size
      Success for All K-5 3 +0.35
      Enhanced Core Reading Instruction 1 1 +0.24
Tier 1 – Classroom Approaches      
     Cooperative Integrated Reading                        & Composition (CIRC) 2-6 3 +0.11
      PALS 1 1 +0.65
Tier 2 – One-to-Small Group Tutoring      
      Read, Write, & Type (T 1-3) 1 1 +0.42
      Lindamood (T 1-3) 1 1 +0.65
      SHIP (T 1-3) K-3 1 +0.39
      Passport to Literacy (TA 1-4/7) 4 4 +0.15
      Quick Reads (TA 1-2) 2-3 2 +0.22
Tier 3 One-to-One Tutoring
      Reading Recovery (T) 1 3 +0.47
      Targeted Reading Intervention (T) K-1 2 +0.50
      Early Steps (T) 1 1 +0.86
      Lindamood (T) K-2 1 +0.69
      Reading Rescue (T or TA) 1 1 +0.40
      Sound Partners (TA) K-1 2 +0.43
      SMART (PV) K-1 1 +0.40
      SPARK (PV) K-2 1 +0.51

Key:    T: Certified teacher tutors

TA: Teaching assistant tutors

PV: Paid volunteers (e.g., AmeriCorps members)

1-X: For small group tutoring, the usual group size for tutoring (e.g., 1-2, 1-4)

(For more information on each program, see www.evidenceforessa.org)

The table is a road map to eliminating the achievement gaps that our schools have wrestled with for so long. It only lists programs that succeeded at a high level, relative to others at the same tier levels. See the full report or www.evidenceforessa for information on all programs.

It is important to note that there is little evidence of the effectiveness of tutoring in grades 3-5. Almost all of the evidence is from grades K-2. However, studies done in England in secondary schools have found positive effects of three reading tutoring programs in the English equivalent of U.S. grades 6-7. These findings suggest that when well-designed tutoring programs for grades 3-5 are evaluated, they will also show very positive impacts. See our review on secondary reading programs at www.bestevidence.org for information on these English middle school tutoring studies. On the same website, you can also see a review of research on elementary mathematics programs, which reports that most of the successful studies of tutoring in math took place in grades 2-5, another indicator that reading tutoring is also likely to be effective in these grades.

Some of the individual programs have shown effects large enough to overcome gaps all by themselves if they are well implemented (i.e., ES = +0.50 or more). Others have effect sizes lower than +0.50 but if combined with other programs elsewhere on the list, or if used over longer time periods, are likely to eliminate gaps. For example, one-to-one tutoring by certified teachers is very effective, but very expensive. A school might implement a Tier 1 or multi-tier approach to solve all the easy problems inexpensively, then use cost-effective one-to-small group methods for students with moderate reading problems, and only then use one-to-one tutoring with the small number of students with the greatest needs.

Schools, districts, and states should consider the availability, practicality, and cost of these solutions to arrive at a workable solution. They then need to make sure that the programs are implemented well enough and long enough to obtain the outcomes seen in the research, or to improve on them.

But the inescapable conclusion from our review is that the gaps can be closed, using proven models that already exist. That’s big news, news that demands big changes.

Photo credit: Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Advertisements

Benchmark Assessments: Weighing the Pig More Often?

There is an old saying about educational assessment: “If you want to fatten a pig, it doesn’t help to weigh it more often.”

To be fair, it may actually help to weigh pigs more often, so the farmer knows whether they are gaining weight at the expected levels. Then they can do something in time if this is not the case.

It is surely correct that weighing pigs does no good in itself, but it may serve a diagnostic purpose. What matters is not the weighing, but rather what the farmer or veterinarian does based on the information provided by the weighing.

blog_4-11-19_pigscale_500x432

This blog is not, however, about porcine policy, but educational policy. In schools, districts, and even whole states, most American children take “benchmark assessments” roughly three to six times a year. These assessments are intended to tell teachers, principals, and other school leaders how students are doing, especially in reading and math. Ideally, benchmark assessments are closely aligned with state accountability tests, making it possible for school leaders to predict how whole grade levels are likely to do on the state tests early enough in the year to enable them to provide additional assistance in areas of need. The information might be as detailed as “fourth graders need help in fractions” or “English learners need help in vocabulary.”

Benchmark assessments are only useful if they improve scores on state accountability tests. Other types of intervention may be beneficial even if they do not make any difference in state test scores, but it is hard to see why benchmark assessments would be valuable if they do not in fact have any impact on state tests, or other standardized tests.

So here is the bad news: Research finds that benchmark assessments do not make any difference in achievement.

High-quality, large scale randomized evaluations of benchmark assessments are relatively easy to do. Many have in fact been done. Use of benchmark assessments have been evaluated in elementary reading and math (see www.bestevidence.org). Here is a summary of the findings.

Number of Studies Mean Effect Size
Elementary Reading 6 -0.02
Elementary Math 4    .00
Study-weighted mean 10 -0.01

In a rational world, these findings would put an end to benchmark assessments, at least as they are used now. The average outcomes are not just small, they are zero. They use up a lot of student time and district money.

In our accountability-obsessed educational culture, how could use of benchmark assessments make no difference at all on the only measure they are intended to improve? I would suggest several possibilities.

First, perhaps the most likely, is that teachers and schools do not do much with the information from benchmark assessments. If you are trying to lose weight, you likely weigh yourself every day. But if you then make no systematic effort to change your diet or increase your exercise, then all those weighings are of little value. In education, the situation is much worse than in weight reduction, because teachers are each responsible for 20-30 students. Results of benchmark assessments are different for each student, so a school staff that learns that its fourth graders need improvement in fractions finds it difficult to act on this information. Some fourth graders in every school are excelling in fractions, some just need a little help, and some are struggling in fractions because they missed the prerequisite skills. “Teach more fractions” is not a likely solution except for some of that middle group, yet differentiating instruction for all students is difficult to do well.

Another problem is that it takes time to score and return benchmark assessments, so by the time a team of teachers decides how to respond to benchmark information, the situation has moved on.

Third, benchmark assessments may add little because teachers and principals already know a lot more about their students than any test can tell them. Imagine a principal receiving the information that her English learners need help in vocabulary. I’m going to guess that she already knows that. But more than that, she and her teachers know which English learners need what kind of vocabulary, and they have other measures and means of finding out. Teachers already give a lot of brief, targeted curriculum-linked assessments, and they always have. Further, wise teachers stroll around and listen in on students working in cooperative groups, or look at their tests or seatwork or progress on computer curriculum, to get a sophisticated understanding of why some students are having trouble, and ideas for what to do about it. For example, it is possible that English learners are lacking school-specific vocabulary, such as that related to science or social studies, and this observation may suggest solutions (e.g., teach more science and social studies). But what if some English learners are afraid or unwilling to express themselves in class, but sit quietly and never volunteer answers? A completely different set of solutions might be appropriate in this case, such as using cooperative learning or tutoring strategies to give students safe spaces in which to use the vocabulary they have, and gain motivation and opportunities to learn and use more.

Benchmark assessments fall into the enormous category of educational solutions that are simple, compelling, and wrong. Yes, teachers need to know what students are learning and what is needed to improve it, but they have available many more tools that are far more sensitive, useful, timely, and tied to actions teachers can take.

Eliminating benchmark assessments would save schools a lot of money. Perhaps that money could be redirected to professional development to help teachers use approaches actually proven to work. I know, that’s crazy talk. But perhaps if we looked at what students are actually doing and learning in class, we could stop weighing pigs and start improving teaching for all children.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Do Different Textbooks Have Different Effects on Student Achievement?

The British comedy group Monty Python used to refer to “privileged glimpses into the perfectly obvious.”

And just last week, there they were. In a front-page article, the March 13 edition of Education Week reported that a six-state study of the achievement outcomes of different textbooks found . . . wait for it. . . near-zero relative effects on achievement measures (Sawchuck, 2019).

Really!

The study was led by Harvard’s Thomas Kane, a major proponent of the Common Core, who was particularly upset to find out that textbooks produced before and after the Common Core influenced textbook content had few if any differential effects on achievement.

I doubt that I am the only person who is profoundly unsurprised by these findings. For the past 12 years, I’ve been doing reviews of research on programs’ effects on achievement in rigorous research. Textbooks (or curricula) are usually one of the categories in my reviews. You can see the reviews at www.bestevidence.org. Here is a summary of the average effect sizes for textbooks or curricula:

Review No. of Studies Mean Effect Size
Elementary Reading

(Inns et al., 2019)

9 +0.03
Elementary Math

Pellegrini et al., 2018)

16 +0.06
Secondary Math

(Slavin et al., 2009)

40 +0.03
Secondary Science

(Cheung et al., 2016)

8 +0.10
Weighted Average 73 +0.04

None of these outcomes suggest that textbooks make much difference, and the study-weighted average of +0.04 is downright depressing.

blog_3-28-19_sleepingstudent_500x333

Beyond the data, it is easy to see why evaluations of the achievement outcomes of textbooks rarely find significant positive outcomes. Such studies compare one textbook to another textbook that is usually rather similar. The reason is that textbook publishers respond to the demands of the market, not to evidence of effectiveness. New and existing textbooks were shaped by similar market forces. When standards change, as in the case of the Common Core State Standards in recent years, all textbook companies generally are forced to make changes in the same direction. There may be a brief window of time when new textbooks designed to meet new standards have a temporary advantage, but large publishers are extremely sensitive to such changes, and if they are not up to date in terms of standards today, they soon will be. Still, as the Kane et al. study found, changes in standards do not in themselves improve achievement on a substantial scale. Changes in standards do change market demand, which changes the content of textbooks, but fundamentally, the changes are not enough to make a measurable difference in learning.

Kane was quoted by Education Week as drawing the lesson from the study that perhaps it isn’t the textbooks that matter, but rather how the textbooks are used:

“What levels of coaching or more-intensive professional development are required to help teachers use rigorous materials at higher levels of fidelity, and does that produce larger benefits?” (Sawchuk, 2019, p. 17).

This sounds logical, but recent research in elementary mathematics calls this approach into question. Pellegrini et al. (2018) examined a category of programs that provide teachers with extensive professional development focused on math content and pedagogy. The average effect size across 12 studies was only +0.04, or essentially zero. In contrast, what did work very well were one-to-one and one-to-small group tutoring (mean effect size = +0.29) and professional development focused on classroom management and motivation (mean effect size = +0.25). In other words, programs focusing on helping teachers use standards-based materials added little if anything to the learning impact of textbooks. What mattered, beyond tutoring, were approaches that change classroom routines and relationships, such as cooperative learning or classroom management methods.

Changing textbooks matters little, and adding extensive professional development focused on standards adds even less. Instead, strategies that engage, excite, and accommodate individual needs of students are what we find to matter a great deal, across many subjects and grade levels.

This should be a privileged glimpse into the perfectly obvious. Everyone knows that textbooks make little difference. Walk through classrooms in any school, teaching any subject at any grade level. Some classes are exciting, noisy, fully engaged places in which students are eager to learn. Others are well, teaching the textbook. In which type of class did you learn best? In which type do you hope your own children will spend their time in school, or wish they had?

What is obvious from the experience of every teacher and everyone who has ever been a student is that changing textbooks and focusing on standards do not in themselves lead to classrooms that kindle the love of learning. Imagine that you, as an accomplished adult educator, took a class in tennis, or Italian, or underwater basket weaving. Would a teacher using better textbooks and more advanced standards make you love this activity and learn from it? Or would a teacher who expresses enthusiasm for the subject and for the students, who uses methods that engage students in active social activities in every lesson, obtain better outcomes of every kind? I hope this question answers itself.

I once saw a science teacher in Baltimore teaching anatomy by having students take apart steamed crabs (a major delicacy in Baltimore). The kids were working in groups, laughing at this absurd idea, but they were learning like crazy, and learning to love science. I would submit that this experience, these connections among students, this laughter are the standards our schools need to attain. It’s not about textbooks, nor professional development on textbooks.

Another Baltimore teacher I knew taught a terrific unit on ancient Egypt. The students made their own sarcophagi, taking into the afterlife the things most important to them. Then the class went on a field trip to a local museum with a mummy exhibit, and finally, students made sarcophagi representing what Egyptians would value in the afterlife.  That’s what effective teaching is about.

The great 18th century Swedish botanist Carl Linnaeus took his students on walks into forests, fields, and lakes around Uppsala University. Whatever they found, they brought back held high singing and playing conch shell trumpets in triumph.  That’s what effective teaching is about.

In England, I saw a teacher teaching graph coordinates. She gave each student’s desk a coordinate, from 1, 1 to 5, 5, and put up signs labeled North, South, East, and West on the walls. She then made herself into a robot, and the students gave her directions to get from one coordinate to another. The students were laughing, but learning. That’s what effective teaching is about.

No textbook can compete with these examples of inspired teaching. Try to remember your favorite textbook, or your least favorite. I can’t think of a single one. They were all the same. I love to read and love to learn, and I’m sure anyone reading this blog is the same. But textbooks? Did a textbook ever inspire you to want to learn more or give you enthusiasm for any subject?

This is a privileged glimpse into the perfectly obvious to which we should devote our efforts in innovation and professional development. A textbook or standard never ignited a student’s passion or curiosity. Textbooks and standards may be necessary, but they will not transform our schools. Let’s use what we already know about how learning really happens, and then make certain that every teacher knows how to do the things that make learning engage students’ hearts and emotions, not just their minds.

References

Cheung, A., Slavin, R.E., Kim, E., & Lake, C. (2016). Effective secondary science programs: A best-evidence synthesis. Journal of Research on Science Teaching, 54 (1), 58-81. Doi: 10.1002/tea.21338

Inns, A., Lake, C. Byun, S., Shi, C., & Slavin, R. E. (2019). Effective Tier 1 reading instruction for elementary schools: A systematic review. Paper presented at the annual meeting of the Society for Research on Educational Effectiveness, Washington, D.C.

Pellegrini, M., Inns, A., Lake, C., & Slavin, R. E. (2018). Effective programs in elementary mathematics: A best-evidence synthesis. Manuscript submitted for publication.

Sawchuk, S. (2019, March 13). New texts failed to lift test scores in six-state study. Education Week, 38(25), 1, 17.

Slavin, R.E., Lake, C., & Groff, C. (2009). Effective programs in middle and high school mathematics: A best-evidence synthesis. Review of Educational Research, 79 (2), 839-911.

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

How Tutoring Could Benefit Students Who Do Not Need It

If you’ve been following my blogs, or if you know research on tutoring, you know that tutoring is hugely beneficial to the students who receive it. Recent research in both reading and math is finding important impacts of forms of tutoring that are much less expensive and scalable than the one-to-one tutoring by certified teachers that was once dominant. A review of research my colleagues and I did on effective programs for struggling readers found a mean effect size of +0.29 for one-to-small group tutoring provided by teaching assistants, across six studies of five programs involving grades K-5 (Inns, Lake, Pellegrini, & Slavin, 2018). Looking across the whole tutoring literature, in math as well as reading, positive outcomes of less expensive forms of tutoring are reliable and robust.

My focus today, however, is not on children who receive tutoring. It’s on all the other children. How does tutoring for the one third to one half of students in typical Title I schools who struggle in reading or math benefit the remaining students who were doing fine?

Imagine that Title I elementary schools had an average of three teaching assistants providing one-to-four tutoring in 7 daily sessions. This would enable them to serve 84 students each day, or perhaps 252 over the course of the year. Here is how this could benefit all children.

blog_1-31-19_tutorsnkids_500x333

Photo credit: Courtesy of Allison Shelley/The Verbatim Agency for American Education: Images of Teachers and Students in Action

Eliminating within-class ability grouping.

Teachers justifiably complain about the difficulty of teaching highly diverse classes. Historically, they have dealt with diversity, especially in reading, by assigning students to top, middle, and low ability groups, so that they can provide appropriate levels of instruction for each group. Managing multiple ability groups is very difficult, because two-thirds of the class has to do seatwork (paper or digital) during follow-up time, while the teacher is working with another reading group. The seatwork cannot be challenging, because if it were, students would be asking questions, and the whole purpose of this seatwork is to keep students quiet so the teacher can teach a reading group. As a result, kids do what they do when they are bored and the teacher is occupied. It’s not pretty.

Sufficient high-quality one-to-four reading tutoring could add an effect size of at least +0.29 to the reading performance of every student in the low reading group. The goal would be to move the entire low group to virtual equality with the middle group. So some low achievers might need more and some less tutoring, and a few might need one-to-one tutoring rather than one-to-four. If the low and middle reading groups could be made similar in reading performance, teachers could dispense with within-class grouping entirely, and teach the whole class as one “reading group.” Eliminating seatwork, this would give every reading class three times as much valuable instructional time. This would be likely to benefit learning for students in the (former) middle and high groups directly (due to more high quality teaching), as well as taking a lot of stress off of the teacher, making the classroom more efficient and pleasant for all.

Improving behavior.

Ask any teacher who are the students who are most likely to act out in his or her class. It’s the low achievers. How could it be otherwise? Low achievers take daily blows to their self-esteem, and need to assert themselves in areas other than academics. One such “Plan B” for low achievers is misbehavior. If all students were succeeding in reading and math, improvements in behavior seem very likely. This would benefit all. I remember that my own very well-behaved daughter frequently came home from school very upset because other students misbehaved and got in trouble for it. Improved behavior due to greater success for low achievers would be beneficial to struggling readers themselves, but also to their classmates.

Improved outcomes in other subjects.

Most struggling students have problems in reading and math, and these are the only subjects in which tutoring is ever provided. Yet students who struggle in reading or math are likely to also have trouble in science, social studies, and other subjects, and these problems are likely to disrupt teaching and learning in those subjects as well. If all could succeed in reading and math, this would surely have an impact on other subjects, for non-struggling as well as struggling students.

Contributing to the teacher pipeline.

In the plan I’ve discussed previously, teaching assistants providing tutoring would mostly be ones with Bachelor’s degrees but not teaching certificates. These tutors would provide an ideal source of candidates for accelerated certification programs. Tutors who have apparent potential could be invited to enroll in such programs. The teachers developed in this way would be a benefit to all schools and all students in the district.  This aspect would be of particular value in inner city or rural areas that rely on teachers who grew up nearby and have roots in the area, as these districts usually have trouble attracting and maintaining outsiders.

Reducing special education and retention.

A likely outcome of successful tutoring would be to reduce retentions and special education placements. This would be of great benefit to the students not retained or not sent to special education, but also to the school as a whole, which would save a great deal of money.

Ultimately, I think every teacher, every student, and every parent would love to see every low reading group improve in performance enough to eliminate the need for reading groups. The process to get to this happy state of affairs is straightforward and likely to succeed wherever it is tried. Wouldn’t a whole school and a whole school system full of success be a great thing for all students, not just the low achievers?

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

How do Textbooks Fit Into Evidence-Based Reform?

In a blog I wrote recently, “Evidence, Standards, and Chicken Feathers,” I discussed my perception that states, districts, and schools, in choosing textbooks and other educational materials, put a lot of emphasis on alignment with standards, and very little on evidence of effectiveness.  My colleague Steve Ross objected, at least in the case of textbooks.  He noted that it was very difficult for a textbook to prove its effectiveness, because textbooks so closely resemble other textbooks that showing a difference between them is somewhere between difficult and impossible.  Since the great majority of classrooms use textbooks (paper or digital) or sets of reading materials that collectively resemble textbooks, the control group in any educational experiment is almost certainly also using a textbook (or equivalents).  So as evidence becomes more and more important, is it fair to hold textbooks to such a difficult standard of evidence? Steve and I had an interesting conversation about this point, so I thought I would share it with other readers of my blog.

blog_12-6-18_textbook_500x404

First, let me define a couple of key words.  Most of what schools purchase could be called commodities.  These include desks, lighting, carpets, non-electronic whiteboards, playground equipment, and so on. Schools need these resources to provide students with safe, pleasant, attractive places in which to learn. I’m happy to pay taxes to ensure that every child has all of the facilities and materials they need. However, no one should expect such expenditures to make a measurable difference in achievement beyond ordinary levels.

In contrast, other expenditures are interventions.  These include teacher preparation, professional development, innovative technology, tutoring, and other services clearly intended to improve achievement beyond ordinary levels.   Educators would generally agree that such investments should be asked to justify themselves by showing their effectiveness in raising achievement scores, since that is their goal.

By analogy, hospitals invest a great deal in their physical plants, furniture, lighting, carpets, and so on. These are all necessary commodities.   No one should have to go to a hospital that is not attractive, bright, airy, comfortable, and convenient, with plenty of parking.  These things may contribute to patients’ wellness in subtle ways, but no one would expect them to make major differences in patient health.  What does make a measurable difference is the preparation and training provided to the staff, medicines, equipment, and procedures, all of which can be (and are) constantly improved through ongoing research, development, and dissemination.

So is a textbook a commodity or an intervention?  If we accept that every classroom must have a textbook or its equivalent (such as a digital text), then a textbook is a commodity, just an ordinary, basic requirement for every classroom.  We would expect textbooks-as-commodities to be well written, up-to-date, attractive, and pedagogically sensible, and, if possible, aligned with state and national standards.  But it might be unfair and perhaps futile to expect textbooks-as-commodities to significantly increase student achievement in comparison to business as usual, because they are, in effect, business as usual.

If, somehow, a print or digital textbook, with associated professional development, digital add-ons, and so forth, turns out to be significantly more effective than alternative, state-of-the-art textbooks, then a textbook could also be considered an intervention, and marketed as such.  It would then be considered in comparison to other interventions that exist only, or primarily, to increase achievement beyond ordinary levels.

The distinction between commodities and interventions would be academic but for the appearance of the ESSA evidence standards.  The ESSA law requires that schools seeking school improvement funding select and implement programs that meet one of the top three standards (strong, moderate, or promising). It gives preference points on other federal grants, especially Title II (professional development), to applicants who promise to implement proven programs. Some states have applied more stringent criteria, and some have extended use of the standards to additional funding initiatives, including state initiatives.  These are all very positive developments. However, they are making textbook publishers anxious. How are they going to meet the new standards, given that their products are not so different from others now in use?

My answer is that I do not think it was the intent of the ESSA standards to forbid schools from using textbooks that lack evidence of effectiveness. To do so would be unrealistic, as it would wipe out at least 90% of textbooks.  Instead, the purpose of the ESSA evidence standards was to encourage and incentivize the use of interventions proven to be effective.  The concept, I think, was to assume that other funding (especially state and local funds) would support the purchase of commodities, including ordinary textbooks.  In contrast, the federal role was intended to focus on interventions to boost achievement in high-poverty and low-achieving schools.  Ordinary textbooks that are no more effective than any others are clearly not appropriate for those purposes, where there is an urgent need for approaches proven to have significantly greater impacts than methods in use today.

It would be a great step forward if federal, state, and local funding intended to support major improvements in student outcomes were held to tough standards of evidence.  Such programs should be eligible for generous and strategic funding from federal, state, and local sources dedicated to the enhancement of student outcomes.  But no one should limit schools in spending their funds on attractive desks, safe and fun playground equipment, and well-written textbooks, even though these necessary commodities are unlikely to accelerate student achievement beyond current expectations.

Photo credit: Laurentius de Voltolina [Public domain]

 This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

Evidence, Standards, and Chicken Feathers

In 1509, John Damian, an alchemist in the court of James IV of Scotland proclaimed that he had developed a way for humans to fly. He made himself some wings from chicken feathers and jumped from the battlements of Stirling Castle, the Scottish royal residence at the time. His flight was brief but not fatal.  He landed in a pile of manure, and only broke his thigh.  Afterward, he explained that the problem was that he used the wrong kind of feathers.  If only he had used eagle feathers, he could have flown, he asserted.  Fortunately for him, he never tried flying again, with any kind of feathers.

blog_11-15-18_humanornithopter_500x314

The story of John Damian’s downfall is humorous, and in fact the only record of it is a contemporary poem making fun of it. Yet there are important analogies to educational policy today from this incident in Scottish history. These are as follows:

  1. Damian proclaimed the success of his plan for human flight before he or anyone else had tried it and found it effective.
  2. After his flight ended in the manure pile, he proclaimed (again without evidence) that if only he’d used eagle feathers, he would have succeeded. This makes sense, of course, because eagles are much better flyers than chickens.
  3. He was careful never to actually try flying with eagle feathers.

All of this is more or less what we do all the time in educational policy, with one big exception.  In education, based on Damian’s experience, we might have put forward policies stating that from now on human powered flight must only be done with eagle feathers, not chicken feathers.

What I am referring to in education is our obsession with standards as a basis for selecting textbooks, software, and professional development, and the relative lack of interest in evidence. Whole states and districts spend a lot of time devising standards and then reviewing materials and services to be sure that they align with these standards. In contrast, the idea of checking to see that texts, software, and PD have actually been evaluated and found to be effective in real classrooms with real teachers and students has been a hard slog.

Shouldn’t textbooks and programs that meet modern standards also produce higher student performance on tests closely aligned with those standards? This cannot be assumed. Not long ago, my colleagues and I examined every reading and math program rated “meets expectations” (the highest level) on EdReports, a website that rates programs in terms of their alignment with college- and career-ready standards.  A not so grand total of two programs had any evidence of effectiveness on any measure not made by the publishers. Most programs rated “meets expectations” had no evidence at all, and a smaller number had been evaluated and found to make no difference.

I am not in any way criticizing EdReports.  They perform a very valuable service in helping schools and districts know which programs meet current standards. It makes no sense for every state and district to do this for themselves, especially in the cases where there are very few or no proven programs. It is useful to at least know about programs aligned with standards.

There is a reason that so few products favorably reviewed on EdReports have any positive outcomes in rigorous research. Most are textbooks, and very few textbooks have evidence of effectiveness. Why? The fact is that standards or no standards, EdReports or no EdReports, textbooks do not differ very much from each other in aspects that matter for student learning. Textbooks differ (somewhat) in content, but if there is anything we have learned from our many reviews of research on what works in education, what matters is pedagogy, not content. Yet since decisions about textbooks and software depend on standards and content, decision makers almost invariably select textbooks and software that have never been successfully evaluated.

Even crazy John Damian did better than we do. Yes, he claimed success in flying before actually trying it, but at last he did try it. He concluded that his flying plan would have worked if he’d used eagle feathers, but he never imposed this untested standard on anyone.

Untested textbooks and software probably don’t hurt anyone, but millions of students desperately need higher achievement, and focusing resources on untested or ineffective textbooks, software, and PD does not move them forward. The goal of education is to help all students succeed, not to see that they use aligned materials. If a program has been proven to improve learning, isn’t that a lot more important than proving that it aligns with standards? Ideally, we’d want schools and districts to use programs that are both proven effective and aligned with standards, but if no programs meet both criteria, shouldn’t those that are proven effective be preferred? Without evidence, aren’t we just giving students and teachers eagle feathers and asking them to take a leap of faith?

Photo credit: Humorous portrayal of a man who flies with wings attached to his tunic, Unknown author [Public domain], via Wikimedia Commons/Library of Congress

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.

 

The Mill and The School

 

On a recent trip to Scotland, I visited some very interesting oat mills. I always love to visit medieval mills, because I find it endlessly fascinating how people long ago used natural forces and materials – wind, water, and fire, stone, wood, and metal – to create advanced mechanisms that had a profound impact on society.

In Scotland, it’s all about oat mills (almost everywhere else, it’s wheat). These grain mills date back to the 10th century. In their time, they were a giant leap in technology. A mill is very complicated, but at its heart are two big innovations. In the center of the mill, a heavy millstone turns on top of another. The grain is poured through a hole in the top stone for grinding. The miller’s most difficult task is to maintain an exact distance between the stones. A few millimeters too far apart and no milling happens. A few millimeters too close and the heat of friction can ruin the machinery, possibly causing a fire.

The other key technology is the water wheel (except in windmills, of course). The water mill is part of a system that involves a carefully controlled flow of water from a millpond, which the miller uses to provide exactly the right amount of water to turn a giant wooden wheel, which powers the top millstone.

blog_5-2-18_TheMaidOfTheMill_500x472

The medieval grain mill is not a single innovation, but a closely integrated system of innovations. Millers learned to manage this complex technology in a system of apprenticeship over many years.

Mills enabled medieval millers to obtain far more nutrition from an acre of grain than was possible before. This made it possible for land to support many more people, and the population surged. The whole feudal system was built around the economics of mills, and mills thrived through the 19th century.

What does the mill have to with the school? Mills only grind well-behaved grain into well-behaved flour, while schools work with far more complex children, families, and all the systems that surround them. The products of schools must include joy and discovery, knowledge and skills.

Yet as different as they are, mills have something to teach us. They show the importance of integrating diverse systems that can then efficiently deliver desired outcomes. Neither a mill nor an effective school comes into existence because someone in power tells it to. Instead, complex systems, mills or schools, must be created, tested, adapted to local needs, and constantly improved. Once we know how to create, manage, and disseminate effective mills or schools, policies can be readily devised to support their expansion and improvement.

Important progress in societies and economies almost always comes about from development of complex, multi-component innovations that, once developed, can be disseminated and continuously improved. The same is true of schools. Changes in governance or large-scale policies can enhance (or inhibit) the possibility of change, but the reality of reform depends on creation of complex, integrated systems, from mills to ships to combines to hospitals to schools.

For education, what this means is that system transformation will come only when we have whole-school improvement approaches that are known to greatly increase student outcomes. Whole-school change is necessary because many individual improvements are needed to make big changes, and these must be carefully aligned with each other. Just as the huge water wheel and the tiny millstone adjustment mechanism and other components must work together in the mill, the key parts of a school must work together in synchrony to produce maximum impact, or the whole system fails to work as well as it should.

For example, if you look at research on proven programs, you’ll find effective strategies for school management, for teaching, and for tutoring struggling readers. These are all well and good, but they work so much better if they are linked to each other.

To understand this, first consider tutoring. Especially in the elementary grades, there is no more effective strategy. Our recent review of research on programs for struggling readers finds that well-qualified teaching assistants can be as effective as teachers in tutoring struggling readers, and that while one-to-four tutoring is less effective than one-to-one, it is still a lot more effective than no tutoring. So an evidence-oriented educator might logically choose to implement proven one-to-one and/or one-to-small group tutoring programs to improve school outcomes.

However, tutoring only helps the students who receive it, and it is expensive. A wise school administrator might reason that tutoring alone is not sufficient, but improving the quality of classroom instruction is also essential, both to improve outcomes for students who do not need tutoring and to reduce the number of students who do need tutoring. There is an array of proven classroom methods the principal or district might choose to improve student outcomes in all subjects and grade levels (see www.evidenceforessa.org).

But now consider students who are at risk because they are not attending regularly, or have behavior problems, or need eyeglasses but do not have them. Flexible school-level systems are necessary to ensure that students are in school, eager to learn, well-behaved, and physically prepared to succeed.

In addition, there is a need to have school principals and other leaders learn strategies for making effective use of proven programs. These would include managing professional development, coaching, monitoring implementation and outcomes of proven programs, distributed leadership, and much more. Leadership also requires jointly setting school goals with all school staff and monitoring progress toward these goals.

These are all components of the education “mill” that have to be designed, tested, and (if effective) disseminated to ever-increasing numbers of schools. Like the mill, an effective school design integrates individual parts, makes them work in synchrony, constantly assesses their functioning and output, and adjusts procedures when necessary.

Many educational theorists argue that education will only change when systems change. Ferocious battles rage about charters vs. ordinary public schools, about adopting policies of countries that do well on international tests, and so on. These policies can be important, but they are unlikely to create substantial and lasting improvement unless they lead to development and dissemination of proven whole-school approaches.

Effective school improvement is not likely to come about from let-a-thousand-flowers-bloom local innovation, nor from top-level changes in policy or governance. Sufficient change will not come about by throwing individual small innovations into schools and hoping they will collectively make a difference. Instead, effective improvement will take root when we learn how to reliably create effective programs for schools, implement them in a coordinated and planful way, find them effective, and then disseminate them. Once such schools are widespread, we can build larger policies and systems around their needs.

Coordinated, schoolwide improvement approaches offer schools proven strategies for increasing the achievement and success of their children. There should be many programs of this kind, among which schools and districts can choose. A school is not the same as mill, but the mill provides at least one image of how creating complex, integrated replicable systems can change whole societies and economies. We should learn from this and many other examples of how to focus our efforts to improve outcomes for all children.

Photo credit: By Johnson, Helen Kendrik [Public domain], via Wikimedia Commons

This blog was developed with support from the Laura and John Arnold Foundation. The views expressed here do not necessarily reflect those of the Foundation.