Do Textbooks Matter?

Recently, some colleagues and I were speaking with some superintendents about how they use evidence to select educational programs. Although they had many useful insights, it quickly became clear that when we said programs, they thought we meant textbooks.

But a textbook is not a program.

A program is a set of coordinated strategies designed to improve student achievement. A hallmark of programs is that they almost invariably include a lot of professional development. Textbooks almost invariably do not. A half day inservice is typical of textbooks, while programs generally provide many days of inservice, plus on-site coaching and feedback, on line or in-school discussions, and so on. Programs may also include textbooks or other curriculum or software, but they are focused on changing teachers’ behaviors in the classroom, not just changing content.

Content is important, of course, but changing textbooks almost never changes outcomes on achievement tests. My colleagues and I have published reviews of research on elementary and secondary reading, math, and science. In every one of these reviews, changing textbooks is one category of interventions that has been studied, often in very large, randomized experiments. Yet textbooks never make much of a difference on average, and it is rare that they show significant differences in even a single qualifying study. These studies usually use standardized tests as the outcome measures, and a focus of many textbook innovations is on closer alignment with current standards and assessments. Yet that strategy has been tried and evaluated many times, and it almost never works.

What does work, in contrast, are programs, ones that provide a great deal of professional development on well-defined models of teaching, such as cooperative learning and teaching of metacognitive skills.

Not every study of professional development approaches shows increases in achievement, and there are other factors that underlie more and less effective innovations. But on average, the difference between professional development and textbook approaches is crystal clear, and applies to all subjects and grade levels.

So when your textbooks are worn out, or you are tired of them, go ahead and replace them with a shiny new textbook or digital textbook. It won’t make any difference in students’ learning, but no one wants students to have shabby or outdated material. But when you decide to do something to improve student learning, do not follow your textbook adoption cycle. Instead, find proven programs with outstanding and sufficient professional development. Your kids, parents, and colleagues will be glad you did.

Why Rigorous Studies Get Smaller Effect Sizes

When I was a kid, I was a big fan of the hapless Washington Senators. They were awful. Year after year, they were dead last in the American League. They were the sort of team that builds diehard fans not despite but because of their hopelessness. Every once in a while, kids I knew would snap under the pressure and start rooting for the Baltimore Orioles. We shunned them forever, right up to this day.

With the Senators, any reason for hope was prized, and we were all very excited when some hotshot batter was brought up from the minor leagues. But they almost always got whammed, sent back down or traded but never heard from again. I’m sure this happens in every team. In fact, I just saw an actual study comparing batting averages for batters in their last year in the minors to their first year in the majors. The difference was dramatic. In the majors, the very same batters had much lower averages. The impact was equivalent to an effect size of -0.70. That’s huge. I’d call this effect the Curse of the Major Leagues.

Why am I carrying on about baseball? I think it provides an analogy to explain why large, randomized experiments in education have characteristically lower effect sizes than experiments that are quasi-experiments, smaller, or (especially) both.

In baseball, batting averages decline because the competition is tougher. The pitchers are faster, the fielders are better, and maybe the minor league parks are smaller, I don’t know. In education, large randomized experiments are tougher competition, too. Randomized experiments are tougher because the experimenter doesn’t get the benefit of self-selection by the schools or teachers choosing the program. In a randomized experiment everyone has to start fresh at the beginning of the study, so the experimenter does not get the benefit of working with teachers who may already be experienced in the experimental program.

In larger studies, the experimenter has more difficulty controlling every variable to ensure high-quality implementation. Large studies are more likely to use standardized tests rather than researcher-made tests. If these are state tests used for accountability, the control group can be assumed to be trying just as much as the experimental group to improve students’ scores on the objectives taught on those tests.

What these problems mean is that when a program is evaluated in a large randomized study, and the results are significantly positive, this is cause for real celebration because the program had to overcome much tougher competition. The successful program is far more likely to work in realistic settings at serious scale because it has been tested under more life-like conditions. Other experimental designs are also valuable, of course, if only because they act like the minor leagues, nurturing promising prospects and then sending the best to the majors where their mettle will really be tested. In a way, this is exactly the tiered evidence strategy used in Investing in Innovation (i3) and in the Institute for Education Sciences (IES) Goal 2-3-4 progression. In both cases, smaller grants are made available for development projects, which are nurtured and, if they show promise, may be funded at a higher level and sent to the majors (validation, scale-up) for rigorous, large-scale evaluation.

The Curse of the Major Leagues was really just the product of a system for fairly and efficiently bringing the best players into the major leagues. The same idea is the brightest hope we have for offering schools throughout the U.S. the very best instructional programs on a meaningful scale. After all those years rooting for the Washington Senators, I’m delighted to see something really powerful coming from our actual Senators in Washington. And I don’t mean baseball!

What if Evidence Doesn’t Match Ideology?

Several years ago when the Conservative Party was first coming into office in the U.K., I had an opportunity to meet with a High Government Official. He had been told that I was a supporter of phonics in early reading, and that was what he wanted to talk about. We chatted amicably for some time about our agreement on this topic.

Then the Great Man turned to another topic. What did I think about the evidence on ability grouping?

I explained that the evidence did not favor ability grouping, and was about to explain why when he cut me off with the internationally understood gesture meaning, “I’m a very busy and important person. Get out of my office immediately.” Ever since then, the British government has gotten along just fine without my advice.

What the Great Man was telling me, of course, is the depressing reality of why it is so difficult to change policy or practice with evidence. Most people value research when it supports the ideological position they already had, and reject research when it does not. The result is that policy and practice remain an ideological struggle, little influenced by the actual findings of research. Advocates of a given position seek evidence to throw at their opponents or to defend themselves from evidence thrown at them by the “other side.” And we all too often evaluate evidence based on the degree to which it corresponds to our pre-existing beliefs rather than re-evaluating our beliefs in light of evidence. I recall that at a meeting of Institute of Education Sciences (IES) grantees, a respected superintendent spoke to the whole assemblage and, entirely without irony or humor, defined good research as that which confirms his beliefs, and bad research as that which contradicts his beliefs.

A scientific field only begins to move forward when researchers and users of the research come to accept research findings whether or not they support their previous beliefs. Not that this is easy. Even in the most scientific of fields, it usually takes a great deal of research over an extended time period to replace a widely accepted belief with a contrary set of findings. In the course of unseating the old belief, researchers who dare to go against the current orthodoxy have difficulty finding an audience, funding, promotions, or respect, so it’s a lot easier to go with the flow. Yet true sciences do change their minds based on evidence, even if they must often be dragged kicking and screaming to the altar of knowledge. One classic example I’ve heard of involved the bacterial origin of gastric ulcers. Ulcers were once thought to be caused by stress, until an obscure Australian researcher deliberately gave himself an ulcer by drinking a solution swarming with gastric bacteria. He then cured himself with a drug known to kill those bacteria. Today, the stress theory is gone and the bacteria theory is dominant, but it wasn’t easy.

Education researchers are only just beginning to have enough confidence in our own research to expect policy makers, practitioners, and other researchers to change their beliefs on the basis of evidence. Yet education will not be an evidence-driven field until evidence begins to routinely change beliefs about what works for students and what does not. We need to change thinking not only about individual programs or principles, but about the role of evidence itself. This is one reason that it is so important that research in education be of impeccable quality, so that we can have confidence that findings will replicate in future studies and will generalize to many practical applications.

A high government official in health would never dismiss research on gastric ulcers because he or she still believed that ulcers are caused by stress. A high government official in agriculture would never dismiss research on the effects of certain farming methods on soil erosion. In the U.S., at least, our Department of Education has begun to value evidence and to encourage schools to adopt proven programs and practices, but there is a long way to go before education joins medicine and agriculture in willingness to recognize and promote findings of rigorous and replicated research. We’re headed in the right direction, but I have to admit that the difficulties getting there are giving me one heck of an ulcer.*

*Just kidding. I’m fine.