Research and Development Saved Britain. Maybe They Will Save U.S. Education

One of my summer goals is to read the entire 6 volume history of the Second World War by Winston Churchill. So far, I’m about halfway through the first volume, The Gathering Storm, about the period leading up to 1939.

The book is more or less a wonderfully written rant about the Allies’ shortsightedness. As Hitler built up his armaments, Britain, France, and their allies maintained a pacifist insistence on reducing theirs. Only in the mid-thirties, when war was inevitable, did Britain start investing in armaments, but even then at a very modest pace.

Churchill was a Member of Parliament but was out of government. However, he threw himself into the one thing he could do to help Britain prepare: research and development. In particular, he worked with top scientists to develop the capacity to track, identify, and shoot down enemy aircraft.

When the 1940 Battle of Britain came and German planes tried to destroy and demoralize Britain in advance of an invasion, the inventions by Churchill’s group were a key factor in defeating them.

Churchill’s story is a good analogue to the situation of education research and development. In the current environment, the best-evaluated, most effective programs are not in wide use in U.S. schools. But the research and development that creates and evaluates these programs is essential. It is useful right away in hundreds of schools that do use proven programs already. But imagine what would happen if federal, state, or local governments anywhere decided to use proven programs to combat their most important education problems at scale. Such a decision would be laudable in principle, but where would the proven programs come from? How would they generate convincing evidence of effectiveness?  How would they build robust and capable organizations to provide high-quality professional development materials, and software?

The answer is research and development, of course. Just as Churchill and his scientific colleagues had to create new technologies before Britain was willing to invest in air defenses and air superiority at scale, so American education needs to prepare for the day when government at all levels is ready to invest seriously in proven educational programs.

I once visited a secondary school near London. It’s an ordinary school now, but in 1940 it was a private girls’ school. A German plane, shot down in the Battle of Britain, crash landed near the school. The girls ran out and captured the pilot!

The girls were courageous, as was the British pilot who shot down the German plane. But the advanced systems the British had worked out and tested before the war were also important to saving Britain. In education reform we are building and testing effective programs and organizations to support them. When government decides to improve student learning nationwide, we will be ready, if investments in research and development continue.

This blog is sponsored by the Laura and John Arnold Foundation

Advertisements

First, Do No Harm: The Blind Duchess

One of the great strengths of the evidence movement in education has been its bipartisan nature. Democrats and Republicans, liberals and conservatives have equal reasons to want to know what works, and to try to ensure that government funds will be spent primarily on programs and practices known to work from rigorous experiments. Politics plays a legitimate role in determining how evidence is put to use and what values should underpin policies in education, but whatever one’s politics, everyone should agree that it’s essential to know what works.

Yet while it’s easy to conclude that we should promote what does work, it’s not so easy to decide what to do in areas in which there is insufficient evidence. We want to gradually replace programs and practices not known to work with those that do have strong evidence, but what do we do while the evidence base is growing?

I recently took a tour of Chatsworth, a huge, ornate great house that since the 1600s has been the family seat of the Dukes of Devonshire, one of the wealthiest families in England. Our guide told us about a famous duchess, Georgiana (a distant ancestor of Lady Diana). In the late 1700s, Georgiana suffered from irritated eyes. Her physician had her bathe her eyes in a mixture of milk and vinegar, and then applied leeches. As a consequence, she went blind.

The duchess’ physician ignored the first principle of medicine, stated in the Hippocratic Oath that every doctor swears: “First, do no harm.” I think it is safe to assume that the Duchess of Devonshire could have had any doctor in Europe, and that the one she chose was considered one of the best. Yet even a duke or duchess or a king or queen could not obtain the kind of routine medical care we take for granted today. But what their doctors could at least do was to take care to avoid making things worse. Recall that around the same time, King George III suffered from insanity, perhaps caused by his physicians, and George Washington was killed by his leech-using doctors.

Today, in education, we face a different set of problems, but we must start with the Hippocratic principle: First, do no harm. But for us, doing no harm is less than straightforward.

In educational practice, we have a growing but still modest number of proven interventions. As I’ve noted previously, our Evidence for ESSA website contains approximately 100 reading and math programs for grades K-12 that meet current ESSA evidence standards. That’s impressive, but it is still a smaller number of proven programs than we’d like, especially in secondary schools and in mathematics. We are now working on the category of science, which has fewer proven programs, and we know that writing will have fewer still.

In all of education research, there are very few programs known to do actual harm, so we don’t really have to worry too much about the Duchess of Devonshire’s problem. What we have instead is a growing number of proven and promising programs and a very large number of programs that have not been evaluated at all, or not well enough to meet current standards, or with mixed outcomes.

For educators, “First, do no harm” may be taken to mean, “use programs proven to be effective when they exist, but stick with promising approaches until better ones have been validated.” That is, in areas in which there are many programs with strong, positive evidence of effectiveness, select one of these and implement it with care. But in areas in which few programs exist, use the best available, rather than insisting on perfect evidence.

One example of what I am talking about is after-school programs. Under federal funding called 21st Century Community Learning Centers (21st CCLC), after-school programs have been widespread. Several years ago, an evaluation of 21st CCLC found few benefits for student achievement, and there are few if any proven models in broad scale use. So how should the federal government respond?

I would argue that the principle of “First, do no harm” would support continuing but significantly modifying 21st CCLC or other after-school funding. Federal support for after school programs might be reformed to focus on development and evaluation of programs that improve achievement outcomes. In this way, federal dollars continue to support a popular and perhaps useful service, but more importantly they support R&D to find out which forms of that service produce the desired outcomes. The same approach might be applied to career and technical education and many other areas in which there is substantial federal, state, or local investment, but little evidence of what works. In each case, funds currently supporting popular but unproven services could be shifted to supporting development, evaluation, and dissemination of proven, effective strategies designed to meet the activity’s goal.

Instead of potentially harming students or taking away funding altogether, such a strategy could open up new areas of inquiry that would be sure to eventually create and validate effective programs where they do not exist today.

In education, “First, do no harm” should not justify abandonment of whole areas of education services that lack a sufficient selection of proven approaches. Instead, it means supplementing service dollars with R&D dollars to find out what works. We cannot justify the kinds of treatment the Duchess of Devonshire received for her irritated eye, but we also cannot justify using her case to give up on the search for effective treatments.

This blog is sponsored by the Laura and John Arnold Foundation

Research and Practice: “Tear Down This Wall”

I was recently in Berlin. Today, it’s a lively, entirely normal European capital. But the first time I saw it, it was 1970, and the wall still divided it. Like most tourists, I went through Checkpoint Charlie to the east side. The two sides were utterly different. West Berlin was pleasant, safe, and attractive. East Berlin was a different world. On my recent trip, I met a young researcher who grew up in West Berlin. He recalls his father being taken in for questioning because he accidentally brought a West Berlin newspaper across the border. Western people could visit, but western newspapers could get you arrested.

I remember John F. Kennedy’s “Ich bin ein Berliner” speech, and Ronald Reagan’s “Mr. Gorbechev, tear down this wall.” And one day, for reasons no one seems to understand, the wall was gone. Even today, I find it thrilling and incredible to walk down Unter den Linden under the Brandenburg Gate. Not so long ago, this was impossible, even fatal.

The reason I bring up the Berlin Wall is that I want to use it as an analogy to another wall of less geopolitical consequence, perhaps, but very important to our profession. This is the wall between research and practice.

It is not my intention to disrespect the worlds on either side of the research/practice wall. People on both sides care deeply about children and bring enormous knowledge, skill, and effort to improving educational outcomes. In fact, that’s what is so sad about this wall. People on both sides have so much to teach and learn from the other, but all too often, they don’t.

What has been happening in recent years is that the federal government, at least, has been reinforcing the research/practice divide in many ways, at least until the passage of the Every Student Succeeds Act (ESSA) (more on this later). On one hand, government has invested in high-quality educational research and development, especially through Investing in Innovation (i3) and the Institute of Education Sciences (IES). As a result, over on the research side of the wall there is a growing stockpile of rigorously evaluated, ready-to-implement education programs for most subjects and grade levels.

On the practice side of the wall, however, government has implemented national policies that may or may not have a basis in research, but definitely do not focus on use of proven programs. Examples include accountability, teacher evaluation, and Common Core. Even federal School Improvement Grants (SIG) for the lowest-achieving 5% of schools in each state had loads of detailed requirements for schools to follow but said nothing at all about using proven programs or practices, until a proven whole-school reform option was permitted as one of six alternatives at the very end of No Child Left Behind. The huge Race to the Top funding program was similarly explicit about standards, assessments, teacher evaluations, and other issues, but said nothing about use of proven programs.

On the research side of the wall, developers and researchers were being encouraged by the U.S. Department of Education to write their findings clearly and “scale up” their findings to presumably eager potential adopters on the practice side. Yet the very same department was, at the same time, keeping education leaders on the practice side of the wall scrambling to meet federal standards to obtain Race to the Top, School Improvement Grants, and other funding, none of which had anything much to do with the evidence base building up on the research side of the wall. The problem posed by the Berlin Wall was not going to be resolved by sneaking well-written West Berlin newspapers into East Berlin, or East Berlin newspapers into West Berlin. Rather, someone had to tear down the wall.

The Every Student Succeeds Act (ESSA) is one attempt to tear down the research/practice wall. Its definitions of strong, moderate, and promising levels of evidence, and provision of funding incentives for using proven programs (especially in applications for school improvement), could go a long way toward tearing down the research/practice wall, but it’s too soon to tell. So far, these definitions are just words on a page. It will take national, state, and local leadership to truly make evidence central to education policy and practice.

On National Public Radio, I recently heard recorded recollections from people who were in Berlin the day the wall came down. One of them really stuck with me. West Berliners had climbed to the top of the wall and were singing and cheering as gaps were opened. Then, an East German man headed for a gap. The nearby soldiers, unsure what to do, pointed their rifles at him and told him to stop. He put his hands in the air. The West Germans on the wall fell silent, anxiously watching.

A soldier went to find the captain. The captain came out of a guardhouse and walked over to the East German man. He put his arm around his shoulders and personally walked him through the gap in the wall.

That’s leadership. That’s courage. It’s what we need to tear down our wall: leaders at all levels who actively encourage the world of research and the world of practice to become one. To do it by personal and public examples, so that educators can understand that the rules have changed, and that communication between research and practice, and use of proven programs and practices, will be encouraged and facilitated.

Our wall can come down. It’s only a question of leadership, and commitment to better outcomes for children.

This blog is sponsored by the Laura and John Arnold Foundation

The Age of Evidence

In 1909, most people outside of cities had never seen an automobile. Those that existed frequently broke down, and there were few mechanics. Roads were poor, fuel was difficult to obtain, and spare parts were scarce. The automobile industry had not agreed on the best form of propulsion, so steam-powered cars, electric cars, and diesel cars shared the road with gasoline-powered cars. The high cost of cars made them a rich man’s hobby and a curiosity rather than a practical necessity for most people.

Yet despite all of these limitations, anyone with eyes to see knew that the automobile was the future.

I believe that evidence in education is at a similar point in its development. There are still not enough proven programs in all fields and grade levels. Educators are just now beginning to understand what proven programs can do for their children. Old fashioned textbooks and software lacking a scintilla of evidence still dominate the market. Many schools that do adopt proven programs may still not get promised outcomes because they shortchange professional development, planning, or other resources.

Despite all of these problems, any educator or policy maker with eyes to see knows that evidence is the future.

There are many indicators that the Age of Evidence is upon us. Here are some I’d point to.

· The ESSA evidence standards. The definitions in the ESSA law of strong, moderate, and promising levels of evidence and incentives to use programs that meet them are not yet affecting practice on a large scale, but they are certainly leading to substantial discussion about evidence among state, district, and school leaders. In the long run, this discussion may be as important as the law itself in promoting the use of evidence.

· The availability of many more proven programs. Our Evidence for ESSA website found approximately 100 K-12 reading and math programs meeting one of the top three ESSA standards. Many more are in the pipeline.

· Political support for evidence is growing and non-partisan. Note that the ESSA standards were passed with bipartisan support in a Republican Congress. This is a good indication that evidence is becoming a consensus “good government” theme, not just something that professors do.

· We’ve tried everything else. Despite their commendable support for research, both the G.W. Bush and the Obama administrations mainly focused on policies that ignored the existence of proven programs. Progress in student performance was disappointing. Perhaps next time, we’ll try using what works.

Any of these indicators could experience setbacks or reversals, but in all of modern history, it’s hard to think of cases in which, once the evidence/innovation genie is out of the bottle, it is forced back inside. Progress toward the Age of Evidence may be slower or more uneven than we’d like, but this is an idea that once planted tends to persist, and to change institutions.

If we have proven, better ways to teach reading or math or science, to increase graduation rates and college and career readiness, or to build students’ social and emotional skills and improve classroom behavior, then sooner or later policy and practice must take this evidence into account. When it does, it will kick off a virtuous cycle in which a taste for evidence among education leaders leads to substantial investments in R&D by government and the private sector. This will lead to creation and successful evaluation of better and better educational programs, which will progressively add to the taste for evidence, feeding the whole cycle.

The German philosopher Schopenhauer once said that every new idea is first ridiculed, then vehemently opposed, and then accepted as self-evident. I think we are nearing a turning point, where resistance to the idea of evidence of effectiveness as a driver in education is beginning to give way to a sense that of course any school should be using proven programs. Who would argue otherwise?

Other fields, such as medicine, agriculture, and technology, including automotive technology, long ago reached a point of no return, when innovation and evidence of effectiveness began to expand rapidly. Because education is mostly a creature of government, it has been slower to change, but change is coming. And when this point of no return arrives, we’ll never look back. As new teaching approaches, new uses of technology, new strategies for engaging students with each other, new ways of simulating scientific, mathematical, and social processes, and new ways of accommodating student differences are created, successfully evaluated, and disseminated, education will become an exciting, constantly evolving field. And no one will even remember a time when this was not the case.

In 1909, the problems of automotive engineering were daunting, but there was only one way things were going to go. True progress has no reverse gear. So it will be in education, as our Age of Evidence dawns.

This blog is sponsored by the Laura and John Arnold Foundation

Make No Small Plans

In recent years, an interest has developed in very low-cost interventions that produce small but statistically significant effects on achievement. The argument for their importance is that their costs are so low that their impacts are obtained very cost-effectively. For example, there is evidence that a brief self-affirmation exercise can produce a small but significant effect on achievement, and that a brief intervention to reduce “social identity threat” can do the same. A study in England found that a system to send 50 text messages over the course of a school year, announcing upcoming tests and homework assignments, feedback on grades, test results, and attendance, and updates on topics being studied in school, improved math achievement slightly but significantly, at a cost of about $5 a year.

There is nothing wrong with these mini-interventions, and perhaps all schools should use them. Why not? Yet I find myself a bit disturbed by this type of approach.

Step back from the small-cost/small-but-significant outcome and consider the larger picture, the task in which all who read this blog are jointly engaged. We face an educational system that is deeply dysfunctional. Disadvantaged students remain far, far behind middle-class students in educational outcomes, and the gap has not narrowed very much over decades. The U.S. remains well behind peer nations in achievement and is not catching up. Dropout rates in the U.S. are diminishing, but skill levels of American high school graduates from disadvantaged schools are appalling.

For schools with limited budgets to spend on reform, it may be all they can do to adopt a low-cost/low-but-significant outcome intervention on the basis that it’s better than nothing. But again, step back to look at the larger situation. The average American student is educated at a cost of more than $11,000 per year. There are whole-school reform approaches, such as our own Success for All in elementary and middle schools and BARR in secondary schools, that cost around $100 per student per year, and have been found to make substantial differences in student achievement. Contrast this to a low-cost program that costs, say, $5 per student per year.

$100 is less than 1% of the ordinary cost of educating a student, on average. $5 is less than .05%, of course. But in the larger scheme of things, who cares? Using a proven whole-school reform model might perhaps increase the per-student cost from $11,000 to $11,100. Adding the $5 low-cost intervention could increase per-student costs from $11,000 to $11,005. From the perspective of a principal who has a fixed budget, and simply does not have $100 per student to spend, the whole-school approach may be infeasible. But from the system perspective, the difference between $11,000 and $11,100 (or $11,005) is meaningless if it truly increases student achievement. Our goal must be to make meaningful progress in reducing gaps and increasing national achievement, not make a small difference that happens to be very inexpensive.

I once saw a film in England on the vital role of carrier pigeons in the English army in World War II. I’m sure those pigeons played their part in the victory, and they were very cost-effective. But ultimately, it was expensive tanks and planes and ships and other weapons, and courageous men and women, who won the war, not pigeons, and piling up small (even if effective) interventions was just not going to do it.

We should be in a war against inequality, disadvantage, and mediocre outcomes in education. Winning it will require identification and deployment of whole-school, whole-district, and whole-state approaches that can be reliably replicated and intelligently applied to ensure positive, widespread improvements. If we just throw pigeon-sized solutions at huge and tenacious problems, our difficulties are sure to come home to roost.

This blog is sponsored by the Laura and John Arnold Foundation

Luther Burbank and Evidence in Education

The first house my wife and I owned was a corner rowhouse in Baltimore. The house was small and the yard was small, but there was a long fenceline with no trees overhead. We decided to put in an orchard. By the time we were done, we’d planted apples, pears, peaches, cherries, Italian and Santa Rosa plums, blueberries, and Concord grapes. Some worked out better than others, but at harvest season we were picking and canning a lot of fruit.

My involvement with our tiny orchard led me to find out about Luther Burbank, the botanist who developed many of the fruit varieties we know today in the late 1800s. He and later botanists over the years developed a cornucopia of fruits, vegetables, and flowers of all kinds.

Burbank had nothing to do with educational research, as far as I know, but the process he developed to create and test many fruit varieties has lessons for us in education.

Burbank’s better-tasting or hardy-growing or heat-tolerant varieties enabled fruit to improve dramatically in diversity and quality and to diminish in cost. All to the good. Some of the new fruits were enthusiastically adopted by farmers, because they knew their customers would buy them. Some did not work out, because they were not so tasty, difficult or expensive to grow, or hard to ship. But the ones that did work out, like the delicious Santa Rosa plums we grew in profusion in Baltimore, changed the world. Burbank developed the Russet potato, for example, which rescued Ireland and the rest of Europe from the potato famine.

Now imagine that Burbank’s fruit trees were instead treated like new educational programs. Opponents of innovative fruits would try to get governments to ban them. Proponents might try to get governments to require them. Governments themselves might try to regulate them.

As a result, fruit tree development might have withered or died on the vine.

In education, we need to adopt the approaches agriculture has used since the time of Benjamin Franklin to promote ever-better seeds, varieties, and techniques. Government, publishers, software developers, and others should be in a constant process of creating and evaluating effective methods. Governments should set standards for evaluation as well as funding a great deal of it. When proven programs exist, government at all levels should help make educators aware of the programs and the evidence, much as agricultural extension agents do with farmers.

What government should not do is require schools or districts to adopt particular programs. Instead, they should provide information and incentives, but leave the choices up to the schools. Agricultural extension agents tell farmers about new research, but it is up to them to use it or not. If they choose not to do so but their neighbors do, and their neighbors get bigger yields and higher profits, they are likely to change their minds soon enough.

Similarly, government should not limit the creativity and ideas that are being explored in order to promote one particular design. Innovations should be field driven and address a broad range of issues in different ways to discover what works. Imagine if Burbank and his colleagues were only permitted to experiment with one variety of produce. What might have happened if the Russet potato had never been discovered?

In education, government needs to jumpstart research, development, and dissemination, and it needs to honestly present the evidence and provide resources for educators to use to adopt and perhaps further test innovations. Burbank’s brilliant hybrids would have been local curiosities if the Stark Seed Company had not provided, well, seed funding and marketing support. Changing metaphors, government needs to provide the field, the ball, the rules, and serve as referee and cheerleader, but then let the teams compete in the full light of public view.

America’s students can become the best in the world, if we use the same strategies that have made it strong economically. Create policies favoring innovation and use of proven programs and then stand back. That’s all Luther Burbank needed to revolutionize fruit tree production, and it’s all educational research and development needs to transform teaching and learning.

Scaling Up: Penicillin and Education

In 1928, the Scottish scientist Alexander Fleming invented penicillin. As the story goes, he invented penicillin by accident, when he left a petri dish containing bacteria on his desk overnight and the next morning found that it was infected with rod-shaped organisms that had killed the bacteria. Fleming isolated the rods and recognized that if they could kill bacteria, they might be useful in curing many diseases.

Early on it was clear that penicillin had extraordinary possibilities. In World War I, more soldiers and civilians had been killed by bacterial diseases than were killed by bullets. What if these diseases could be cured? Early tests showed very promising effects.

Yet there was a big problem. No one knew how to produce penicillin in quantity. Very small experiments established that penicillin had potential for curing bacterial infections and was not toxic. However, the total world supply at the onset of World War II was about enough for a single adult. The impending need for penicillin was obvious, but it still was not ready for prime time.

American and British scientists finally began to work together to find a way to scale up production of penicillin. Finally, the Merck Company developed a mass production method, and was making billions of units by D-Day.

The key dynamic of the penicillin story has much in common with an essential problem of education reform. The Merck work did not change the structure of penicillin itself, but Merck scientists did a lot of science and experimentation to find strains that were stable and replicable. In education reform, it is equally the case that the development and initial evaluation of a given program may be a very different process from that intended to carry out large-scale evaluations and scaling up of proven programs.

In some cases, different organizations may be necessary to do large scale evaluation and implementation, as was the case with Merck and Fleming, and in other cases the same organization may carry though the development, initial evaluation, large-scale evaluation, and dissemination. Whoever is responsible for the various steps, their requirements are similar.

At small scale, innovators are likely to work in schools nearby, where they can frequently visit schools, see what is going on, hear teachers’ perspectives, and change strategies in course in response to what is going on. At small scale, programs might vary a great deal from class to class or school to school. Homemade measures, opinions, observations, and other informal indicators may be all developers need or want. From a penicillin perspective, this is still the Fleming level.

When a program moves to the next level, it may be working in many schools or distant locations, and the approach must change substantially. This is the Merck stage of development in penicillin terms. Developers must have a very clear idea of what the program is, and then provide student materials, software, professional development, and coaching directed toward helping teachers to enact the program effectively. Rather than being able to adapt a great deal to the desires or ideas of every school or teacher, principals and teachers can be asked to vote on participation, with an understanding that if they decide to participate, they commit to follow the program more or less as designed, with reasonable variations in light of unique characteristics of the school (e.g., urban/rural, presence of English learners, or substantial poverty). Professional development and coaching need to be standardized, with room for appropriate adaptations. Organizations that provide large-scale services need to learn how to manage functions such as finance, human resources, and IT.

As programs grow, they should seek funding for large-scale, randomized evaluations, ideally by third party evaluators.

In order to get to the Merck level in education reform, we must be ready to build robust, flexible, self-sustaining organizations, capable of ensuring positive impacts of educational programs on a broad scale. Funding from government and private foundations are needed along the way, but the organizations ultimately must be able to operate mostly or entirely on revenues from schools, especially Title I or other funds likely to be available in many or most schools.

Over the years, penicillin has saved millions of lives, due to the pioneering work of Fleming and the pragmatic work of Merck. In the same way, we can greatly enhance the learning of millions of children, combining innovative design and planful, practical scale-up.